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Transgenic Cells Expressing Glucosvltransferase Nucleic Acids 

The invention relates to transgenic cells which have been transformed with 
glucosvltransferase (GTases) nucleic acids. 

5 

GTases are enzymes which post-translationaily transfer glucosyl residues from an activated 
nucleotide sugar to monomeric and polymeric acceptor molecules such as other sugars, 
proteins, lipids and other organic substrates. These glucosylated molecules take part in 
diverse metabolic pathways and processes. The transfer of a glucosyl moiety can alter the 

10 acceptor's bioactivity, solubility and transport properties within the cell and throughout the 
plant. One family of GTases in higher plants is defined by the presence of a C-terminal 
consensus sequence. The GTases of this family function in the cytosol of plant cells and 
catalyse the transfer of glucose to small molecular weight substrates, such as phenylpropanoid 
derivatives, coumarins, flavonoids, other secondary metabolites and molecules known to act 

15 as plant hormones. Available evidence indicates that GTases enzymes can be highly specific, 
such as the maize and Arabidopsis GTases that glucosylate indole-3-acetic acid (IAA). 

The production and use of paper has increased in the last 10 years. For example, between 
1989 and 1999 the production of paper and board in the UK has increased from 4.6 to 6.6 
20 million tonnes. Worldwide consumption has also reflected a general increase in paper usage. 
For example, in the UK per capita consumption of paper is over 200kg per annum. In the 
USA this figure is over 300kg per annum. 

Wood used in the paper industry is initially particulated, typically by chipping, before 
25 conversion to a pulp which can be utilised to produce paper. The pulping process involves 
the removal of lignin. Lignin is a major non-carbohydrate component of wood and comprises 
approximately one quarter of the raw material in wood pulp. The removal of lignin is 
desirable since the quality of the paper produced from the pulp is largely determined by the 
lignin content. Many methods have been developed to efficiently and cost effectively remove 
30 lignin from wood pulp. These methods can be chemical, mechanical or biological. For 
example, chemical methods to pulp wood are disclosed in W0981 1294, EP0957198 and 
WO0047812. Although chemical methods are efficient means to remove lignin from pulp it 
is known that chemical treatments can result in degradation of polysaccharides and is 
expensive. Moreover, to remove residual lignin from pulp it is necessary to use strong 
35 bleaching agents which require removal before the pulp can be converted into paper. These 
agents are also damaging to the environment. 
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Biological methods to remove lignin are known. There are however disadvantages associated 
with such methods. For example it is important to provide micro-organisms (eg bacteria 
and/or fungi) which only secrete ligninolytic enzymes which do not affect cellulose fibres. 
5 This method is also very time consuming (can take 3-4 weeks) and expensive due to the need 
to provide bioreactors. Biological treatment can also include pre-treatment of wood chips to 
make them more susceptible to further biological or chemical pulping. 

It is therefore desirable to provide further means by which lignin can be efficiently and cost 
10 effectively removed from wood pulp which do not have the disadvantages associated with 
prior art methods. 

For the sake of clarity reference herein to transgenic means a plant which has been genetically 
modified to include a nucleic acid sequence not naturally found in said plant. For example, by 

15 over-expression of monolignol glucosy (transferases in planta, plant cell wall properties may 
be altered through increasing the flux through biosynthetic intermediates that are obligatory 
for incorporation and assembly of the lignin polymer. Conversely, reduction of the 
monolignol glucoside pools, such as through the use of nucleic acid comprising GTase 
sequences in antisense configuration may lead to altered properties through reducing the flux 

20 through specific intermediates. Changes in lignin composition, such as with decreased ratios 
of coniferyl alcohol to sinapyl alcohol are highly desirable in paper and pulping processes, 
because the more highly methylated lignin (sinapyl alcohol) is more easily removed during 
pulping processes (Chiang et al ( 1 988) TAPPI J. 7 1 , 1 73- 1 76). 

In some applications it may be desirable to change lignin composition and increase the lignin 
25 content of a plant cell to increase the mechanical strength of wood. This would have utility 
in, for example the construction industry or in furniture making. 

Both lignin content and the level of cross-linking of polysaccharide polymers within plant cell 
walls, also play an important role in determining texture and quality of raw materials through 
altering the cell walls and tissue mechanical properties. For example, there is considerable 

30 interest in reducing cell separation in edible tissues since this would prevent over-softening 
and loss of juiciness. Phenolics, such as ferulic acid, play an important role in cell adhesion 
since they can be esterified to cell wall polysaccharides during synthesis and oxidatively 
cross-linked in the wall, thereby increasing rigidity. Most non-lignified tissues contain these 
phenolic components and their levels can be modified by altering flux through the same 

35 metabolic pathways as those culminating in lignin. Therefore, in the same way as for the 

2 
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manipulation of lignin composition and content, GTase nucleic acid in sense and/or antisense 
configurations can be used to affect levels of ferulic acid and related phenylpropanoid 
derivatives that function in oxidative cross-linking. These changes in content have utility in 
the control of raw material quality of edible plant tissues. 

Lignin and oxidative cross-linking in plant cell walls also play important roles in stress and 
defence responses of most plant species. For example, when non-woody tissues are 
challenged by pests or pathogen attack, or suffer abiotic stress such as through mechanical 
damage or UV radiation, the plant responds by localised and systemic alteration in cell wall 
and cytosolic properties, including changes in lignin content and composition and changes in 
cross-linking of other wall components. Therefore, it can also be anticipated that cell- or 
tissue-specific changes in these responses brought about by changed levels of the relevant 
GTase activities will have utility in protecting the plant to biotic attack and biotic/abiotic 
stresses. 

15 GTases also have utility with respect to the modification of antioxidants. Reactive oxygen 
species are produced in all aerobic organisms during respiration and normally exist in a cell in 
balance with biochemical anti-oxidants. Environmental challenges, such as by pollutants, 
oxidants, toxicants, heavy metals and so on, can lead to excess reactive oxygen species which 
perturb the cellular redox balance, potentially leading to wide-ranging pathological 

20 conditions. In animals and humans, cardiovascular diseases, cancers, inflammatory and 
degenerative disorders are linked to events arising from oxidative damage. 

Because of the current prevalence of these diseases, there is considerable interest in anti- 
oxidants, consumed in the diet or applied topically such as in UV-screens. Anti-oxidant 
25 micronutrients obtained from vegetables and fruits, teas, herbs and medicinal plants are 
thought to provide significant protection against health problems arising from oxidative stress. 
Well known anti-oxidants from plant tissues include for example: quercetin, luteolin, and the 
catechin, epicatechin and cyanidin groups of compounds. 

30 Caffeic acid (3,4-dihydroxycinnamic acid) is a further example of an anti-oxidant with 
beneficial therapeutic properties. 

Certain plant species, organs and tissues are known to have relatively high levels of one or 
more compounds with anti-oxidant activity. Greater accumulation of these compounds in 
35 those species, their wider distribution in crop plants and plant parts already used for food and 

3 
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drink production, and the increased bioavailability of anti-oxidants (absorption, metabolic 
conversions and excretion rate) are three features considered to be highly desirable. 

. It will be apparent that changed levels of the relevant GTase activities capable of 
5 glucosylating antioxidant compounds in planta will allow the production of anti-oxidants 
with beneficial properties. GTase sequences can also be expressed in prokaryotes or simple 
eukaryotes, such as yeast, to produce enzymes for biotransformations in those cells, or as in 
vitro processing systems. 

10 Statements of Invention 

According to an aspect of the invention there is provided a transgenic cell comprising a 
nucleic acid molecule which encodes a polypeptide which has: 
i) glucosyltransferase activity: 
15 ii) is selected from the group comprising sequences of Figures 1 A. 2A. 3 A, 4A, 5A, 6A, 

7A, 8A, 9A, 10A, 11A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 

28, 29, 30,31,32 

iii) nucleic acids which hybridise to the sequences represented in (ii) above; and 

iv) nucleic acid sequences which are degenerate as a result of the genetic code to the 
20 sequences defined in (i) and (ii) above. 

In a further preferred embodiment of the invention said nucleic acid molecule anneals under 
stringent hybridisation conditions to the sequence presented in Figures 1A, 2 A, 3 A, 4 A, 5 A, 
6A, 7A, 8A, 9A, 10A, 1 1A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
25 29.30,31,32 

More preferably still said nucleic acid molecule is selected from Figures 7A, 8A, 9A, 10A, 
15, 18, 19, 28or3i. 

30 Stringent hybridisation/washing conditions are well known in the art. For example, nucleic 
acid hybrids that are stable after washing in 0.1xSSC,0.1% SDS at 60°C. It is well known in 
the art that optimal hybridisation conditions can be calculated if the sequence of the nucleic 
acid is known. For example, hybridisation conditions can be determined by the GC content of 
the nucleic acid subject to hybridisation. Please see Sambrook et al (1989) Molecular 

35 Cloning; A Laboratory Approach. A common formula for calculating the stringency 
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conditions required to achieve hybridisation between nucleic acid molecules of a specified 
homology is: 

T m = 8 1 .5° C + 1 6.6 Log [Na'J + 0.41[%G + C] -0.63 (%formamide). 
5 - . ' 

In a preferred embodiment of the invention said transgenic cell is a eukaryotic cell. Preferably 
said eukaryotic cell is a plant cell or yeast cell. 

In an alternative embodiment of the invention said transgenic cell is a prokaryotic cell. 

10 

In a further preferred embodiment of the invention the nucleic acid molecule is selected from 
the group comprising: antisense sequences of the sequences of any one of Figures 1C, 2C, 
3C, 4C, 5C, 6C, 7C, 8C 9C 10C and 11C or parts thereof, or antisense sequences of the 
sense sequences presented in Figures 12-32. More preferably still said antisense sequence is 
15 selected from Figure 7C or 9C 

In a further preferred embodiment of the invention said nucleic acid is cDNA. 

In a yet further preferred embodiment of the invention said nucleic acid is genomic DNA. 

20 

In yet still a further preferred embodiment of the invention said plant is a woody plant 
selected from: poplar; eucalyptus; Douglas fir; pine; walnut; ash; birch; oak; teak; spruce. 
Preferably said woody plant is a plant used typically in the paper industry, for example 
poplar. 

25 

Methods to transform woody species of plant are well known in the art. For example the 
transformation of poplar is disclosed in US4795855 and W091 18094. The transformation of 
eucalyptus is disclosed in EP 1050209 and W09725434. Each of these patents is incorporated 
in their entirety by reference. 

30 

In a still further preferred embodiment of the invention said plant is selected from: corn (Zea 
mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza 
sativa), rye (Secale cerale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower 
(helianthus annuas), wheat (Tritium aestivum), soybean (Glycine max), tobacco (Nicotiana 
35 tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium 
hirsutum), sweet potato (Iopmoea batatus), cassava (Manihot esculenta), coffee (Cofea spp.), 



5 



WO 01/59140 



PCT/GB01/00477 



coconut (Cocos nucifera), pineapple (Anana comosus), citris tree (Citrus spp.) cocoa 
(Theobroma cacao), tea (Camellia senensis), banana (Musa spp.), avacado (Persea 
americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifer indica), olive 
(Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia 
5 (Macadamia intergrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), oats, 
barley, vegetables and ornamentals. 

Preferably, plants of the present invention are crop plants (for example, cereals and pulses, 
maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, and other root, 
10 tuber or seed crops. Important seed crops are oil-seed rape, sugar beet, maize, sunflower, 
soybean, and sorghum. Horticultural plants to which the present invention may be applied 
may include lettuce, endive, and vegetable brassicas including cabbage, broccoli, and 
cauliflower, and carnations and geraniums. The present invention may be applied in tobacco, 
cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum. 

15 

Grain plants that provide seeds of interest include oil-seed plants and leguminous plants. 
Seeds of interest include grain seeds, such as corn, wheat, barley, rice, sorghum, rye, etc. Oil- 
seed plants include cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, 
coconut, etc. Leguminous plants include beans and peas. Beans include guar, locust bean, 
20 fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava been, lentils, chickpea, 
etc. 

According to a further aspect of the invention there is provided a vector comprising the 
nucleic acid according to the invention operably linked to a promoter. 

25 

"Vector" includes, inter alia, any plasmid, cosmid, phage or Agrobacterium binary vector in 
double or single stranded linear or circular form which may or may not be self-transmissable 
or mobilizable, and which can transform a prokaryotic or eukaryotic host either by integration 
into the cellular genome or exist extrachromosomally (e.g. autonomous replicating plasmid 
30 with an origin of replication ie an episomal vector). 

Suitable vectors can constructed, containing appropriate regulatory sequences, including 
promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, 
marker genes and other sequences as appropriate. For further details see, for example, 
35 Molecular Cloning: Laboratory Manual: 2 nd edition, Sambrook et al. 1989, Cold Spring Habor 
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Laboratory Press or Current Protocols in Molecular Biology, Second Edition, Ausubel et al. 
Eds., John Wiley & Sons, 1992. 

Specifically included are shuttle vectors by which is meant a DNA vehicle capable, naturally 
5 or by design, of replication in two different host organisms, which may be selected from 
actinomycetes and related species, bacteria and eukaryotic (e.g. higher plant, mammalian, 
yeast or fungal cells). 

A vector including nucleic acid according to the invention need not include a promoter or 
10 other regulatory sequence, particularly if the vector is to be used to introduce the nucleic acid 
into cells for recombination into the gene. 

Preferably the nucleic acid in the vector is under the control of, and operably linked to, an 
appropriate promoter or other regulatory elements for transcription in a host cell such as a 
15 microbial, (e.g. bacterial), or plant cell. The vector may be a bi-functional expression vector 
which functions in multiple hosts. In the case of GTase genomic DNA this may contain its 
own promoter or other regulatory elements and in the case of cDNA this may be under the 
control of an appropriate promoter or other regulatory elements for expression in the host cell. 

20 By "promoter" is meant a nucleotide sequence upstream from the transcriptional initiation site 
and which contains all the regulatory regions required for transcription. Suitable promoters 
include constitutive, tissue-specific, inducible, developmental or other promoters for 
expression in plant cells comprised in plants depending on design. Such promoters include 
viral, fungal, bacterial, animal and plant-derived promoters capable of functioning in plant 

25 cells. 

Constitutive promoters include, for example CaMV 35S promoter (Odell et al. (1985) Nature 
313, 9810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christian 
et al. (1989) Plant Mol. Biol. 18 (675-689); pEMU (Last et al. (1991) Theor Appl. Genet. 81: 
30 581-588); MAS (Velten et al. (1984) EMBO J. 3. 2723-2730); ALS promoter (U.S. 
Application Seriel No. 08/409,297), and the like. Other constitutive promoters include those 
in U.S. Patent Nos. 5,608,149; 5,608,144; 5,604,121: 5,569,597; 5,466,785; 5,399,680, 
5,268,463; and 5,608,142. 
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Chemical-regulated promoters can be used to modulate the expression of a gene in a plant 
through the application of an exogenous chemical regulator. Depending upon the objective, 
the promoter may be a chemical-inducible promoter, where application of the chemical 
induced gene expression, or a chemical-repressible promoter, where application of the 
5 chemical represses gene expression. Chemical-inducible promoters are known in the art and 
include, but are not limited to, the maize ln2-2 promoter, which is activated by 
benzenesulfonamide herbicide safeners, the maize GST promoter, which is activated by 
hydrophobic electrophilic compounds that are used as pre-emergent herbicides, and the 
tobacco PR-la promoter, which is activated by salicylic acid. Other chemical-regulated 
10 promoters of interest include steroid-responsive promoters (see, for example, the 
glucocorticoid-inducible promoter in Schena et al. (1991) Proc. Natl. Acad. Sci. USA 88: 
10421-10425 and McNellis et al. (1998) Plant J. 14(2): 247-257) and tetracycline-inducible 
and tetracycline-repressible promoters (see, for example, Gatz et al. (1991) Mol. Gen. Genet. 
227: 229-237, and US Patent Nos. 5.814.618 and 5,789,156, herein incorporated by reference. 

15 

Where enhanced expression in particular tissues is desired, tissue-specific promoters can be 
utilised. Tissue-specific promoters include those described by Yamamoto et al. (1997) Plant 
J. 12(2): 255-265; Kawamata et al. (1997) Plant Cell Physiol. 38(7): 792-803; Hansen et al. 
(1997) Mol. Gen. Genet. 254(3): 337-343; Russell et al. (1997) Transgenic Res. 6(2): 157- 

20 168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant 
Physiol. 112(2): 525-535; Canevascni et al. (1996) Plant Physiol. 112(2): 513-524; 
Yamamoto et al. (1994) Plant Cell Physiol. 35(5): 773-778; Lam (1994) Results Probl. Cell 
Differ. 20: 181-196; Orozco et al. (1993) Plant Moi. Biol. 23(6): 1 129-1 138; Mutsuoka et al. 
(1993) Proc. Natl. Acad. Sci. USA 90 (20): 9586-9590: and Guevara-Garcia et al (1993) Plant 

25 J. 4(3): 495-50. 

"Operably linked" means joined as pan of the same nucleic acid molecule, suitably positioned 
and oriented for transcription to be initiated from the promoter. DNA operably linked to a 
promoter is "under transcriptional initiation regulation" of the promoter. In a preferred aspect, 
30 the promoter is an inducible promoter or a developmentally regulated promoter. 

Particular of interest in the present context are nucleic acid constructs which operate as plant 
vectors. Specific procedures and vectors previously used with wide success upon plants are 
35 described by Guerineau and Muliineaux (1993) (Plant transformation and expression vectors. 
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In: Plant Molecular Biology Labfax (Croy RRD ed) Oxford, BIOS Scientific Publishers, pp 
121-148. Suitable vectors may include plant viral-derived vectors (see e.g. EP-A- 194809). 
If desired, selectable genetic markers may be included in the construct, such as those that 
confer selectable phenotypes such as resistance to antibodies or herbicides (e.g. kanamycin, 
5 hygromycin, phosphinotricin, chlorsulfuron, methotrexate, gentamycin, spectinomycin, 
imidazolinones and glyphosate). 

According to a further aspect of the invention there is provided a method of enhancing 
monolignol glucoside synthesis in a plant comprising causing or allowing expression of at 
10 least one GTase nucleic acid according to the invention in a plant. Preferably the plant is a 
woody plant species. 

According to a further aspect of the invention there is provided a method of inhibiting 
monolignol glucoside synthesis in a plant comprising causing or allowing expression of at 
15 least one GTase antisense nucleic acid according to the invention in a plant. Preferably the 
plant is a woody plant species. 

Inhibition of GTase expression may, for instance, be achieved using anti-sense technology. 

In using anti-sense genes or partial gene sequences to down-regulate gene expression, a 
nucleotide sequence is placed under the control of a promoter in a "reverse orientation" such 
that transcription yields RNA which is complementary to normal mRNA transcribed from the 
"sense" strand of the target gene. See, for example, Rothstein et al, 1987; Smith et al, ( 1998), 
Nature 334, 724-726; Zhang et al (1992) The Plant Cell 4, 1575-1588, English et al. (1996) 
The Plant Cell 8, 179 188. Antisense technology is also reviewed in Bourque (1995), Plant 
Science 105, 125-149, and Flavell (1994) PNAS USA 91, 3490-3496. 

According to a further aspect of the invention there is provided a nucleotide sequence 
encoding an antisense RNA molecule complementary to a sense mRNA molecule encoding 
30 for a polypeptide having a glucosyl transferase activity in the biosynthesis of at least a 
monolignol glucoside in lignin biosynthesis in a plant, which nucleotide sequence is under 
transcriptional control of a promoter and a terminator, both promoter and terminator capable 
of functioning in plant cells. 

35 Suitable promoters and terminators are referred to hereinabove. 
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According to a further aspect of the invention there is provided a nucleotide sequence 
according to the invention comprising a transcriptional regulatory sequence, a sequence under 
the transcriptional control thereof which encodes an RNA which consists of a plurality of 
subsequences, characterised in that the RNA subsequences are antisense RNAs to mRNAs of 
5 proteins having a GTase activity in the lignin biosynthesis pathway in plant cells. 

In particular, the said RNA subsequences are antisense RNAs to mRNAs of GTase having a 
GTase activity in the lignin biosynthesis pathway in plant cells, such as the GTase of Figs. I- 
11(C) 

10 

The nucleotide sequence may encode an RNA having any number of subsequences. 
Preferably, the number of subsequences lies between 2 and 7 (inclusive) and more preferably 
lies between 2-4 . 

15 According to a further aspect of the invention there is provided a host cell transformed with 
nucleic acid or a vector according to the invention, preferably a plant or a microbial cell. The 
microbial cell may be prokaryotic (eg Escherchia colt Bacillus subtilis) or eukaryotic (eg 
Saccharomyces cerevisiae). 

20 In the transgenic plant cell the transgene may be on an extra-genomic vector or 
incorporated, preferably stably, into the genome. There may be more than one 
heterologous nucleotide sequence per haploid genome. 

According to a yet further aspect of the invention there is provided a method of transforming 
25 a plant cell comprising introduction of a vector into a plant cell and causing or allowing 
recombination between the vector and the plant cell genome to introduce a nucleic acid 
according to the invention into the genome. 

Plants transformed with a DNA construct of the invention may be produced by standard 
30 techniques known in the art for the genetic manipulation of plants. DNA can be introduced 
into plant cells using any suitable technology, such as a disarmed Ti-plasmid vector carried by 
Agrobacterium exploiting its natural gene transferability (EP-A-270355, EP-A-01 16718, 
NAR 12(22):871 1-87215 (1984), Townsend et aL US Patent No. 5,563,055); particle or 
microprojectile bombardment (US Patent No. 5,100,792, EP-A-444882, EP-A-434616; 
35 Sanford et al, US Patent No. 4,945,050; Tomes et ai. (1995) "Direct DNA Transfer into Intact 
Plant Cells via Microprojectile Bombardment", in Plant CelK Tissue and Organ Culture: 

10 
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Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et 
al. (1988) Biotechnology 6: 923-926); microinjection (WO 92/09696, WO 94/00583, EP 
331083, EP 175966, Green et al. 91987) Plant Tissue and Cell Culture, Academic Press, 
Crossway et al. (1986) Biotechniques 4:320-334); electroporation (EP 290395, WO 8706614, 
5 Riggs et al. (1986) Proc. Natl. Acad. Sci. USA_83:5602-5606; D'Halluin et al. 91992). Plant 
Cell 4: 1 495- 1 505) other forms of direct DN A uptake (DE 4005 1 52, WO 90 1 2096. US Patent 
No. 4,684,611, Paszkowski et al. (1984) EMBO J. 3:2717-2722); liposome-mediated DNA 
uptake (e.g. Freeman et al (1984) Plant Cell Physiol, 29:1353); or the vortexing method (e.g. 
Kindle (1990) Proc. Nat. Acad. Sci. USA 87:1228). Physical methods for the transformation 

10 of plant cells are reviewed in Oard (1991) Biotech. Adv. 9:1-1 1. See generally, Weissmger et 
al. (1988) Ann. Rev. Genet. 22:421-477; Sanford et al. (1987) Particulate Sciences and 
Technology 5:27-37; Christou et al. (1988) Plant Physiol. 87:671-674; McCabe et al. (1988) 
Bio/Technology 6:923-926; Finer and McMullen (1991) In Vitro Cell Dev. Biol. 27P:175- 
182: Singh et al. (1988) Theor. Appl. Genet. 96:319-324; Datta et al. (1990) Biotechnology 

15 8:736-740; Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85: 4305-4309; Klein et al. (1988) 
Biotechnology 6:559-563; Tomes, US Patent No. 5,240,855; Buising et al. US Patent Nos. 
5,322, 783 and 5,324,646; Klein et al. (1988) Plant Physiol 91: 440-444; Fromm et al (1990) 
Biotechnology 8:833-839; Hooykaas-Von Slogteren et al. 91984). Nature (London) 31 1:763- 
764; Bytebier et al. (1987) Proc. Natl. Acad. Sci. USA 84:5345-5349; De Wet et al. (1985) in 
20 The Experimental Manipuation of Ovule Tissues ed. Chapman et al. (Longman, New York), 
pp. 197-209; Kaeppler et al. (1990) Plant Cell Reports 9:415-418 and Kaeppler et al. (1992) 
Theor. Appl. Genet. 84:560-566; Li et al. (1993) Plant Cell Reports 12: 250-255 and Christou 
and Ford (1995) Annals of Botany 75: 407-41 3 ;Osjoda et al. (1996) Nature Biotechnology 
n 14:745-750, all of which are herein incorporated by reference. 

25 

Agrobacterium transformation is widely used by those skilled in the art to transform 
dicotyledonous species. Recently, there has been substantial progress towards the routine 
production of stable, fertile transgenic plants in almost all economically relevant monocot 
plants (Toriyama et al. (1988) Bio/Technology 6: 1072-1074; Zhang et al. (1988) Plant Cell 

30 rep. 7379-384; Zhang et al. (1988) Theor. Appl. Genet. 76:835-840; Shimamoto et al. (1989) 
Nature 338:274-276; Datta et al. (1990) Bio/Technology 8: 736-740; Christou et al. (1991) 
Bio/Technology 9:957-962; Peng et al (1991) International Rice Research Institute. Manila, 
Philippines, pp.563-574; Cao et al. (1992) Plant Cell Rep. 11: 585-591; Li et al. (1993) Plant 
Cell Rep. 12: 250-255: Rathore et al. (1993) Plant Mol. Biol. 21:871-884; Fromm et al (1990) 

35 Bio/Technology 8:833-839; Gordon Kamm et al. (1990) Plant Cell 2:603-618; D'Halluin et al. 
(1992) Plant Cell 4:1495-1505; Walters et al. (1992) Plant Mol. Biol. 18:189-200: Koziel et 
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al. (1993). Biotechnology 11194-200; Vasil, I.K. (1994) Plant Mol. Biol. 25:925-937; Weeks 
et al (1993) Plant Physiol. 102:1077-1084; Somers et al. (1992) Bio/Technology 10:1589- 
1594; WO 92/14828. In particular, Agrobacterium mediated transformation is now emerging 
also as an highly efficient transformation method in monocots. (Hiei, et al. (1994) The Plant 
5 Journal 6:271-282). See also, Shimamoto, K. (1994) Current Opinion in Biotechnology 
5:158-162; Vasil, et ai. (1992) Bio/Technology 10:667-674; Vain, et al. (1995) Biotechnology 
Advances 13(4):653-671 ; Vasil, et al. (1996) Nature Biotechnology 14: 702). 

Microprojectile bombardment, electroporation and direct DNA uptake are preferred where 
10 Agrobacterium is inefficient or ineffective. Alternatively, a combination of different 
techniques may be employed to enhance the efficiency of the transformation process, e.g. 
bombardment with Agrobacterium-coated microparticles (EP-A-486234) or microprojectile 
bombardment to induce wounding followed by co-cultivation with Agrobacterium (EP-A- 
486233). 

15 

Plants which include a plant cell according to the invention are also provided. 

In addition to the regenerated plant, the present invention embraces all of the following: a 
clone of such a plant, seed, selfed of hybrid progeny and descendants (e.g. Fl and F2 
20 descendants). 

According to a further aspect of the invention there is provided an isolated nucleic acid 
molecule obtainable from Arabidopsis thaliana which comprises a nucleic acid sequence 
encoding a polypeptide having 

25 

(1) GTase functionality; and 

(2) is capable of adding a glucosyl group via an O-glucosidic linkage to form 

(a) a glucosyl ester of at least one of : 

cinnamic acid; p-coumaric acid; caffeic acid; ferulic acid; and sinapic acid; 
30 and/or 

(b) a 4-O-glucoside of at least one of: 

cinnamic acid; p-coumaric acid; caffeic acid; ferulic acid; sinapic acid; p-coumaryl aldehyde; 
coniferyl aldehyde: sinapyl aldehyde; p-coumaryl alcohol; coniferyl alcohol; and sinapyl 
alcohol. 



WO 01/59140 



PCT/GB01/00477 



In a further aspect of the invention there is provided a polypeptide encoded by an isolated 
nucleic acid molecule of the present invention wherein the said polypeptide is selected from 
the polypeptides of Figures IB, 2B, 3B, 4B, SB, 6B, 7B, 8B, 9B, 10B and 1 IB or functional 
variants and/or parts thereof. Preferably the polypeptide is selected from the group of 
5 polypeptides of Figures 2B, 3B, 4B, 6B, 7B and 9B or functional variants and/or parts thereof. 
Preferably still the polypeptide is selected from the group of polypeptides selected from 
Figures 2B, 3B, 7B and 9B or functional variants and/or parts thereof. Most preferably the 
polypeptide is one of the polypeptides shown in Figures 2B, 3B, 7B or 9B. Polypeptides 
encoded by the sense nucleic acid sequences presented in Figures 12 - 32 are also provided 
10 and readily derived from these sense sequences. 

Variants of sequences having substantial identity or homology with the GTase molecules of 
the invention may be utilized in the practices of the invention. That is, the GTase of Figures 
1A-1 1 A may be modified yet still remain functional. Generally, the GTase will comprise at 
15 least about 40%-60%, preferably about 60%-80%, more preferably about 80%-95% sequence 
identity with a GTase nucleotide sequence of Figures 1 A- 32 herein. 

The activity of functional variant polypeptides may be assessed by transformation into a host 
capable of expressing the nucleic acid of the invention. Methodology for such transformation 
!0 is described in more detail below. 

In a further aspect of the invention there is disclosed a method of producing a derivative 
nucleic acid comprising the step of modifying any of the sequences disclosed above, 
particularly the coding sequence of Figures 1 A, 2A, 3 A, 4A, 5A, 6A, 7A, 8A, 9A, 10A, 1 1 A, 
5 12, 13, 14, 15, 16, 17, 18, 19,20,21,22,23,24,25,26,27,28,29,30,31,32 

Alternatively, changes to a sequence may produce a derivative by way of one or more of 
addition, insertion, deletion or substitution of one or more nucleotides in the nucleic acid, 
leading to the addition, insertion, deletion or substitution or one or more amino acids in the 
0 encoded polypeptide. 

Other desirable mutations may be random or site directed mutagenesis in order to alter the 
activity (e.g. specificity) or stability of the encoded polypeptide or to produce dominant 
negative variants which may alter the flux through lignin biosynthetic pathways to alter the 
5 amount of lignin or an intermediate in the lignin biosynthetic pathway. 
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The invention will now be described with reference to the following Figures and Examples 
which are not to be construed as limiting the invention. 

Scheme 1 : The major intermediates in lignin biosynthesis pathway. 
5 Figure 1A: Sense nucleotide sequence of A062. The coding region starts -from the first 
nucleotide and ends at the last nucleotide; 
Figure IB: The amino acid sequence of A062; 
Figure 1 C: The antisense nucleotide sequence of A062; 

Figure 2A Sense nucleotide sequence of A320. The coding region starts from the first 
10 nucleotide and ends at the last nucleotide; 

Figure 2B The amino acid sequence of A320; 
Figure 2C: The antisense nucleotide sequence of A320; 

Figure 3 A: Sense nucleotide sequence of A41. The coding region starts from the first 
nucleotide and ends at the last nucleotide; 
15 Figure 3 B: The amino acid sequence of A41; 

Figure 3C: The antisense nucleotide sequence of A41; 

Figure 4A: Sense nucleotide sequence of A42. The coding region starts from the first 

nucleotide and ends at the last nucleotide; 
Figure 4B: The amino acid sequence of A42; 
20 Figure 4C: The antisense nucleotide sequence of A42; 

Figure 5 A: Sense nucleotide sequence of A43. The coding region starts from the first 

nucleotide and ends at the last nucleotide; 
Figure 5B: The amino acid sequence of A43; 
Figure 5C: The antisense nucleotide sequence of A43; 
25 Figure 6A: Sense nucleotide sequence of A9 11. The coding region starts from the first 

nucleotide and ends at the last nucleotide; 
Figure 6B: The amino acid sequence of A91 1; 
Figure 6C: The antisense nucleotide sequence of A91 1; 

Figure 7A: Sense nucleotide sequence of A119. The coding region starts from the first 
30 nucleotide and ends at the last nucleotide; 

Figure 7B: The amino acid sequence of Al 19; 
Figure 7C: The antisense nucleotide sequence of A 1 19; 

Figure 8A: Sense nucleotide sequence of A233. The coding region starts from the first 
nucleotide and ends at the last nucleotide; 
35 Figure 8B: The amino acid sequence of A233; 

Figure 8C: The antisense nucleotide sequence of A233; 
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Figure 9A: Sense nucleotide sequence of A407. The coding region starts from the first 

nucleotide and ends at the last nucleotide; 
Figure 9B: The amino acid sequence of A407; 
Figure 9C: The antisense nucleotide sequence of A407; 
5 Figure 10A: Sense nucleotide sequence of A961. The coding region starts from the first 
nucleotide and ends at the last nucleotide; 
Figure 10B: The amino acid sequence of A961; 
Figure 10C: The antisense nucleotide sequence of A961: 

Figure 11 A: Sense nucleotide sequence of A962. The coding region starts from the first 
1 0 nucleotide and ends at the last nucleotide; 

Figure 1 IB: The amino acid sequence of A962; 

Figure 1 1C: The antisense nucleotide sequence of A962. 

Figure 12: The sense nucleotide sequence of UGT71B5; 

Figure 13 The sense nucleotide sequence of UGT71C3: 
15 Figure 14 The sense nucleotide sequence of UGT71C5; 

Figure 1 5 The sense nucleotide sequence of UGT7 1 D 1 ; 

Figure 1 6 The sense nucleotide sequence of UGT73B 1 ; 
Figure 1 7 The sense nucleotide sequence of UGT73B2; 
Figure 18 The sense nucleotide sequence of UGT73B4; 
20 Figure 19 The sense nucleotide sequence of UGT73B5; 
Figure 20 The sense nucleotide sequence of UGT73C 1; 
Figure 2 1 The sense nucleotide sequence of UGT73 1 C; 
Figure 22 The sense nucleotide sequence of UGT73C5; 
Figure 23 The sense nucleotide sequence of UGT73C6; 
25 Figure 24 The sense nucleotide sequence of UGT73C7; 
Figure 25 The sense nucleotide sequence of UGT74F2; 
Figure 26 The sense nucleotide sequence of UGT76E 1 ; 
Figure 27 The sense nucleotide sequence of UGT76E 1 1 
Figure 28 The sense nucleotide sequence of UGT76E 1 2 
30 Figure 29 The sense nucleotide sequence of UGT76E2; 
Figure 30 The sense nucleotide sequence of UGT78D 1 ; 
Figure 3 1 The sense nucleotide sequence of UGT89B 1 ; 
Figure 32 The sense nucleotide sequence of UGT72B3; 
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Figure 33 shows recombinant GST-UGT71C1 fusion protein purified from E. coli using 
glutathione-coupied Sepharose. The protein (5 ng) was analyzed using 10% SDS-PAGE and 
was visualized with Coomassive staining; 

5 Figure 34 shows three different glucose conjugates of caffeic acid, ( caffeoyl-3-O-glucoside, 
caffeoyl-4-O-giucoside and 1 -0-caffeoylglucose), obtained from the glucosyltransferase 
reactions containing the recombinant UGT71C1, UGT73B3 and UGT84A1 respectively. 
Each assay contained 1-2 jag of recombinant UGT, 1 mM caffeic acid, 5 mM UDP-glucose, 
1.4 mM 2-mercaptoethanol and 50 mM TRIS-HC1, pH 7.0. The mix was incubated at 30 °C 
10 for 30 min and was analyzed by reverse-phase HPLC subsequently. Linear gradient (10-16%) 
of acetonitrile in H 2 0 at 1 ml/min over 20 min was used to separate the glucose conjugates 
from caffeic acid. 

Figure 35 A shows the pH optima of UGT71C1 glucosyltransferase activity measured over 
15 the range pH 5.5-8.0 in the reactions containing 50 mM buffer, 1 jig of UGT71C1, 1 mM 
caffeic acid, 5 mM UDP-glucose and 1 .4 mM 2-mercaptoethanol. The mix was incubated at 30 
°C for 30 min. The reaction was stopped by the addition of 20 \l\ of trichloroacetic acid (240 
mg/ml) and was analyzed by reverse-phase HPLC subsequently. The specific enzyme activity 
was expressed as nanomoles of caffeic acid glucosylated per second (nkat) by 1 mg of protein 
20 in 30 min of reaction time at 30 °C. Figure 35B, the time course of UGT71C1 
glucosyltransferase activity was studied by measuring the amount of caffeic acid glucosylated 
by 1 \l% of UGT71C1 in 50 mM TRIS-HC1, pH 7.0. The reactions were carried out and 
analyzed as described in A; 

25 Figure 36 shows UGT71C1 transgenic Arabidopsis thaliana plants and their ability to 
glucosylate caffeic acid; and 

Figure 37 summarises the GTase activities of various GTase polypeptides with respect to 
various anti-oxidant substrates. 
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EXAMPLES 

MATERIALS AND METHODS 
Transformation of Woodv Plant Species 

The transformation of woody plant species is known in the art. See US4795855 and 
W091 18094; EP1050209 and W09725434. Each of these patents are incorporated in their 
entirety by reference. 

Transformation of Non- Woodv Plant Species 

Methods used in the transformation of plant species other than woody species are well known 
in the art and are extensively referenced herein. 

Identification of GTase sequences 

The GTase sequence identification was carried out using GCG software (Wisconsin package, 
version 8.1). Blasta programme was used to search Arabidopsis protein sequences containing a 
PSPG (plant secondary product UDP-glucose glucosyltransferase) signature motif (Hughes and 
Hughes (1994) DNA Sequence 5, 41-49) in EMBL and GenBank sequence database. The 
database information on the GTases described in the present invention are listed in Table 1 . 

Amplification and cloning of the GTase seq uences 

The GTase sequences were amplified from Arabidopsis thaliana Columbia genomic DNA 
with specific primers (Table 2), following standard methodologies (Sambrook et al (1989) 
Molecular Cloning: A Laboratory Manual, 2 nd Ed., Cold Spring Harbor Laboratory, Cold 
Spring Harbor, NY). 50ng of genomic DNA isolated from Arabidopsis thaliana Columbia 
were incubated with 1 x pfu PCR buffer (Stratagene), 250 jiM dNTPS, 50 pmole primer for 
each end, and 5 units of pfu DNA polymerase (Stratagene) in a total of 100 The PCR 
reactions were carried out as outlined in the programme described in Table 3. 

After PCR amplification, the products were double digested by appropriate restriction 
enzymes listed in Table 2 (bold type). The digested DNA fragments were purified using an 
electro-eluction method (Sambrook et al (1989) Molecular Cloning: A Laboratory Manual, 
2 nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) and ligated into the 
corresponding cloning site on pGEX2T plasmid DNA (Pharmacia) by T4 DNA ligase (NEB) 

17 



WO 01/59140 



PCT/GB01/00477 



at 16°C overnight The resulting recombinant plasmid DNA was amplified in E. coli XL1- 
biue cells and was confirmed with the restriction enzymes listed in Table 2 (bold type) 
following the method described by Sambrook et al (1989) Molecular Cloning: A Laboratory 
Manual, 2 nd Ed., Cold Spring Habor Laboratory, Cold Spring Harbor, NY). 
5 : 
Preparation of glucosvltransferase recombinant proteins 

E. coli cells carrying recombinant plasmid DNA as described above were grown at 37°C 
overnight on 2YT (16 g bacto tryptone, 10 g bacto-yeast extract, 5 g NaCI per litre) agar 
10 (1.8% w/v) plate which contained 50 ng/ml ampicillin. A single colony was picked into 2 m! 
of 2YT containing the same concentration of ampicillin. The bacterial culture was incubated 
at 37°C with moderate agitation for 6h. The bacterial culture was transferred into 1 L of 2YT 
and incubated at 20°C subsequently. 0.1 mM IPTG was added when the culture reached 

* 

logarithmic growth phase (A 6oonm - 0.5). The bacterial culture was incubated for another 24 
15 h. The cells were collected by a centrifugation at 7,000 x g for 5 min at 4°C and resuspended 
in 5 ml spheroblast buffer (0.5 mM EDTA, 750 mM sucrose, 200 mM Tris, pH 8.0). 
Lysozyme solution was added to a final concentration of 1 mg/ml. 7-fold volume of 0.5 x 
spheroblast buffer was poured into the suspension immediately and the suspension was 
incubated for 4°C for 30 min under gentle shaking. The spheroblasts were collected by a 

20 centrifugation at 12,000 x g for 5 min at,4°C, and resuspended in 5 ml ice cold PBL buffer 
(140 mM NaCI, 80 mM, NA 2 HP0 4 , 15 mM KH 2 P04). 2 mM of PMSF was added into 
suspension immediately and the suspension was centrifuged at 12,000 x g for 20 min at 4°C 
in order to remove the cell debris. After the centrifugation, the supernatant was transferred to 
a 15 ml tube. 200 ^1 of 50% (v/v) slurry of Glutathione-coupled Sepharose 4B were added 

25 into the tube and the mixture was mixed gently for 30 min at room temperature. The mixture 
was then centrifuged at a very slow speed (500 x g) for 1 min. the supernatant was discarded. 
The beads were washed with 5 mi ice cold PBS buffer three times. After each wash, a short 
centrifugation was applied as described above to sediment the Sepharose beads. To recover 
the expressed protein from Sepharose beads, 100 (il of 20 mM reduced glutathione were used 

30 to resuspend the beads. After 10 min incubation at room temperature, the beads were 
collected and the supernatant containing the expressed protein was collected. The elution step 
was repeated once, and both supernatant fractions were combined and stored at 4°C for 
protein assay and further studies. 
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Protein concentration assay 

The protein assays were carried out by adding 10 ul of protein solution into 900 fil of distilled 
water and 200 ul of Bio-Rad Protein Assay Dye. The absorbance at 595 nm was read. A 
5 series of BSA (bovine serium albumin) at different concentration was used as standard. 
Regression line was plotted based on the coordinates of the BSA concentration against the 
reading at 595 nm. The concentration of protein sample was therefore estimated from the 
regression line after the protein assay. 

10 Assay for enzvme activity 

A standard glucosylation reaction was set up by mixing 2 pig of recombinant proteins with 14 
mM 2-mercaptoethanol, 5 mM UDP-glucose, ImM of various lignin or antioxidant substrate, 
100 mM Tris, pH 7.0, to a total volume of 200 ul. The reaction was carried out at 30°C for 30 
15 min and stopped by the addition of 20 ul trichloroacetic acid (240 mg/ml). All the samples 
were stored at -20°C before the liquid chromatographic assay. 

High-Performance Liquid Chromatograp hic 

20 Reverse-phase HPLC (Waters Separator 2690 and Waters Tunable Absorbance Detector 486, 
Waters Limited, Herts, UK) using a Columbus 5 u C, 8 column (250 x 4.60 mm, 
Phenomenex). Linear gradient of acetonitrile in H 2 0 (all solutions contained 0.1% 
trifluoroacetic acid) at 1 ml/min over 20 min, was used to separate the glucose conjugates 
from their aglycone. The HPLC methods were described as the following: cinnamic acid, X m 

25 nm , 10-55% acetonitrile; /7-coumaric acid, \ 3Utm , 10-25% acetonitrile; caffeic acid, JL 3llnm , 10- 
16% acetonitrile; ferulic acid, X 31I nm , 10-35% acetonitrile; sinapic acid, X m m , 10-40% 
acetonitrile; />-coumaryl aldehyde, l JUttm , 10-46% acetonitrile; coniferyl aldehyde, X 2g3lin „ 10- 
47% acetonitrile; sinapyl aldehyde, \ 2SOmo , 10-47% acetonitrile; p-coumaryl alcohol. X 2Sima> 
10-27% acetonitrile; coniferyl alcohol, X m nm , 10-25% acetonitrile; sinapyl alcohol, X 1S5m> , 

30 1 0-25% acetonitrile. The retention time (/?,) of the glucose conjugates analysed is listed in the 
following: cinnamoylglucose, R t = 12.3 min; p-coumaroylglucose, R = 10.6 min; 
caffeoylglucose, R, = 8.5 min; feruloylglucose, R, = 10.3 min; sinapoylglucose, R t = 9.7 min; 
caffeoyl-4-O-glucoside, /?, = 6.8 min: feruloyl-4-O-glucoside, R t = 7.8 min: sinapoyl-4-O- 
glucoside. R t = 8.2 min: coniferin. R { = 8.2 min: syringin. R, = 9.1 min. 

35 
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10 



The recombinant GTases were shown to have GTase activity towards the major intermediates 
of the lignin biosynthesis pathway (Tables 5 and 6). It is clear from these results that the 
GTases display different specific activity reaction profiles relative to each other on the 
various lignin precursor substrates utilised. 

Michaelis-Menten kinetics were also studied on several of the GTases against their preferred 
substrates (Tables 7 and 8). It is clear from these results that the GTases display different 
enzyme kinetics for different substrates. 

The results (in total) indicate that certain GTases show a greater potential for use in the 
alteration of lignin biosynthesis in planta than others. 

Reducing the Fo rmation of Monolignol GJucosides In Planta 

5 In one approach to reduce the formation of monolignol glucosides in planta, Al 19 and A407 
are down regulated using an antisense strategy (A). Expression of the Al 19 and A407 
antisense sequences is driven by the gene's own promoter. An alternative approach (B) is to 
modify the UDP-glucose binding motif through an in vitro mutagenesis method (Lim et al., 
1998) such that the mutant protein is able to bind the monolignol substrates but loses its 

0 catalytic activity. Such mutant proteins are thought to compete with the functional native 
protein by binding specifically to monolignols. thereby reducing the formation of monolignol 
glucosides. 

Anti-sense approach 
5 Amplification and cloning of the A119 and A407 prompter sequences 

Approximately 2kb of the 5' flanking sequences of Al 19 and A407 are amplified directly 
from genomic DNA by PCR. The promoter fragments are then cloned into a pBluescript 
plasmid vector (Sambrook et al., 1 989). 

Construct chimaeric genes of A119 and A407 promoter and their ORF region in 
antisense orientation. 

The All 9 and A407 cDNA fragments are amplified from pGEXA119 and pGEXA407 by 
PCR. The fragments are then ligated correspondingly into the Al 19 and A407 promoter 
constructs described in (AH 1) with the ORF region in the antisense orientation. 

20 
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The DNA fragments containing the Al 19 and A407 antisense chimaeric genes are amplified 
5 by PCR from the chimaeric constructs described in (A)-(2). The fragments are then ligated 
into a binary vector (Sambrook et al.. 1989). The final constructs are transformed into plants 
subsequently. 

Mutant gene approach 

10 

In vitro site mutagenesis to modi fy the UDPglucose binding motif in A119 and A407 

In vitro site mutagenesis is carried by PCR to modify the sequences encoding the UDPglucose 
binding motif in A119 and A407 (Lim et al.. 1998). The constructs pGEXA119 and 
1 5 pGEXA407 are used in the DNA templates in the PCR reaction. 

Construct chimaeric mutant genes regulated bv Al 19 and A407 promoters 

The A 1 19 and A407 mutant genes are amplified from the pGEXAl 19 and pGEXA407 mutant 
20 constructs described in (BH1 ) by PCR. The A 1 19 and A407 mutant gene fragments are then 
ligated into the Al 19 and A407 promoter constructs described in (AH 1) with the ORF region 
in the sense orientation. 

Preparation of binary construct containing the chimaeric mutant genes Al 19 and A407 

25 

The DNA fragments containing the Al 19 and A407 mutant chimaeric genes are amplified by 
PCR from the chimaeric constructs described in (B)-(2). The fragments are then ligated into a 
binary vector (Sambrook et al., 1989). The final constructs are transformed into plants 
subsequently. 

30 

Enhancing the formation of monolignol gjucosides in planta 

The CaMV 35S promoter fragment is used to drive the expression of Al 19 and A497. DNA 
fragments containing Al 19 and A407 ORF sequences are amplified from pGEXA119 and 
35 pGEXA407 correspondingly by PCR. The DNA fragments are ligated downstream of the 
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CaM V 35S promoter. The constructs are used to transform plants such that the lignin content 
and composition is altered. 

Table I Database information on eleven Arabidopsis GTase genes 



Gene 


protein_id 


chromosome 


Database 


B AC/PI 


gene name in 


name 






acc. no. 


clone 


database 


A062 


Gi|3935156 


I 


ac005106 


T25N20 


T25N20.20 


A320 


Not annotated 


III 


ab0I9232 


MIL23 


not annotated 


A4I 


Emb|CAB 10326.1 


IV 


Z97339 


FCA4 


d 13780c 


A42 


Emb|CAB 10327.1 


IV 


Z97339 


FCA4 


d 13785c 


A43 


Emb|CAB 10328.1 


IV 


z97339 


FCA4 


d 13790c 


A9ll 


Gi|264245l 


II 


ac002391 


T20D16 


T20D16.11 


Al 19 


Not annotated 


V 


abO 18119 


MSN2 


not annotated 


A233 


Wrongly annotated 


IV 


al021961 


F28A23 


wrongly annotated 


A407 


Gi|33 19344 ' 


V 


af077407 


F9D12 


F9D12.4 


A961 


Gi|3582329 


II 


ac005496 


T27AI6 


T27A16.15 


A962 


Gi|358234l 


II 


ac005496 


T27A16 


T27A16.16 



Parameters used for the search of the above Arabidopsis sequences and the programme used are as 
follows: 



10 NETBLAST with the default settings: 
Infile2=nr 
Matrix=Blosum 62 
Translated 
Dbtranslate=l 
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Table 2 DNA sequences and restriction enzyme sites in primers used in amplification of 11 
Arabidopsis Gtase sequences from genomic DNA. 

Sequence complementary to either end of the ORFs are underlined. Restriction enzyme sites that were 
5 used in making expression constructs were in BOLD type. 



pnmer DNA sequence (5'-»3') restriction enzyme sites 

~A062y CGGGTGATCAGGTACC ATGGCGCCACCGr A TTTT gdl Sri Kp^j 
C 

A062 3' CGGAATTCGTCGACTTACTTTAP TTTTArrTrrTr E co RI and Sail 

A320 5' CCCCCGGGTACCATOGAGCTAG AATCTTrTrrrr ^ , n A r r> 

A320 3' CGGAATTCTCGAGTTAAAAGrTT TTGATTr.ATrr EcoRlandXhoI 

A41 5' TGGGATCC ATATCAGA A A TGGTflTTr BamHI 

A41 3' GGGAATTCC TAGTATCCATTATCTTTAr.Tr E coRI 

A42 5' GGGGATCC ATGGACCCGTfTPGTCATACTr B amHI 

A42 3' GGG A ATTCC ACTAGTriTTrT CCGTTOTrTTr EcoRJ 

A43 5- GGGGATCCAATATGGAGAT GGAATrr.TrnTTAr BamHI 

A43 3' GGGAATTCCTTACACnArAT TATTAATr.TTrr. EcoRi 

A91 1 5' GGGGTACCTGATCAATAATGGG CAGTACTr.Ar,r, K pnl and Bell 
G 

A9113* CGG AATTCGTCG ACG AGTTAGG CG ATTGTG A T a T EcoRI and Sail 
C 

AII9 5' CGGG ATCCGGTACC ATGC ATATC AC AAAACPAr BamHI and Kpnl 
A£ 

A 1 19 3' CGGAATTCGCTAGCTAAGCACC ACGTGACAAfiT EcoRJ and Nhel 
CC 

A233 5' CGGGATCCGGTACCATGAGTAG TGATCCTCATrr, BamHI and Kpnl 
I 

A233 3* CGGGATCCGAATTCTACGACGT AAACTCTTCTAT BamHI and EcoRI 
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A407 5' CGGGATCCGGTACC ATGCATATCACAAAACCAr BamHI and Kpnl 

A407 3' CGGAATTCGTCGAC CTAAGCACCACGTCCCAAG EcoRI and Sail 

A96I 5' GGGTGATCAGGTACC ATGGGGAAGCAAGAAC.AT Bell and Kpnl 
fi 

A9613' CGGAATTCGTCGA CTACTTACTTATAGAAACGCC EcoRJ and Sail 
G 

A962 5* GAAGATCTGGTACC ATGGCGA AGr Ar,r A ap.a Ar. Bglll and Kpnl 

A962 3' CGGAATTCGTCGAC CGATCAAAGrrcATCTATG EcoRI and Sail 
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Table 3 PCR programme 



Stage 1(1 cycle) 


Stage II (40 cycles) 


Stage 111(1 cycle) 


95°C 5 min 


95°C 1 min 


95°C 2 min 


55°C 2 min 


55°C 1 min 


55°C 2 min 


72°C 3 min 


72°C 2 min 


72°C 5 min 



25 



WO 01/59140 



PCT/GB01/00477 



Table 4 The HPLC conditions 



Lignin Precursors 


Acetonitrile Gradient (%) 


Detector 

Wavelength 

(nm) 


cinnamic acid 


10-55 


288 


p-coumaric acid 


10-25 


311 


caffeic acid 


10-16 


311 


ferulic acid 


10-35 


311 


sinapic acid 


10-40 


306 


p-coumaryl aldehyde 


10-46 


315 


coniferyl aldehyde 


10-47 


283 


sinapyl aldehyde 


10-47 


280 


p-coumaryl alcohol 


10-27 


283 


coniferyl alcohol 


10-25 


306 


sinapyl alcohol 


10-25 


285 
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Table 5 Specific activity of the recombinant GTases producing glucose ester against lignin 
precursors 

Each assay contained 0.5 ml of potential substrates, 5 mM UDPG and 0.2 ng of recombinant GTases 
5 in a total volume of 200 \x\. The reactions were incubated at 20 °C for 30 min and were stopped by 
addition of 20 nl TCA (240 mg/ml). Each reaction mix was then analysed using HPLC. The specific 
activity (nkat/mg) of the recombinant GTase is defined as the amount of substrate (nmole) converted to 
glucose ester per second by 1 mg of protein at 20 °C under the reaction conditions. 





A41 


A320 


A42 


A43 


A911 


A06 

2 


Cinnamic acid 


0.30 


0.06 


14.21 


0.02 


8.77 


1.62 


p-coumaric acid 


13.53 


0.05 


4.69 


0.03 


4.31 


2.54 


Caffeic acid 


2.61 


0.05 


0.62 


0.01 


0.77 


0.26 


Ferulic acid 


6.64 


0.54 


15.63 


0.04 


2.88 


0.08 


Sinapic acid 


5.35 


15.58 


11.97 


0.05 


0.15 


0.1 
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Table 6 Specific activity of the recombinant GTases producing O-glucosides against 
lignin precursors 

The reactions were set up following the conditions described in Table 1. All the reactions, 
5 except those containing the aldehydes, were stopped by the addition of TCA. The aldehyde 
assay mixs were injected into HPLC immediately after the reactions were completed. The 
specific activity (nkat/mg) of the recombinant GTase is defined as the amount of substrate 
(nmole) converted to 4-O-glucoside per second by 1 mg of protein at 30 °C under the reaction 
conditions. 

10 



Al 19 A407 A961 A962 



Cinnamic acid 


ND a 


ND 


ND 


ND 


ND 


p-coumaric acid 


0.09 


0.02 


0.01 


0.01 


0.01 


caffeic acid 


0.48 


0.13 


0.07 


0.07 


ND 


ferulic acid 


0.37 


14.48 


0.25 


ND 


ND 


sinapic acid 


0.39 


102.56 


65.39 


0.01 


0.01 


p-coumaryl aldehyde 


ND 


0.03 


ND 


0.01 


0.02 


Coniferyl aldehyde 


ND 


1.08 


ND 


0.16 


0.34 


sinapyl aldehyde 


ND 


4.55 


ND 


0.57 


0.50 


p-coumaryl alcohol 


ND 


ND 


ND 


ND 


ND 


Coniferyl alcohol 


0.46 


67.53 


2.78 


0.57 


0.49 


sinapyl alcohol 


0.05 


126.16 


1 14.76 


0.35 


0.45 



"ND, not detected 
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Table 7 Kinetic studies on the recombinant GTases producing glucose esters against lignin precursors 



A41 




A320 




A42 




A911 




A062 




Km 




K m 


v 

♦mix 


K m 




K m 




K m 


Vina* 


MM 


nkat/mg 


mM 


nkat/mg 


mM 


nkat/m 
g 


mM 


nkat/m 
g 


mM 


nkat/ 
mg 


1.51 


— 


1.80 


— 


0.72 


— 


1.05 


— 


2.36 


— 










0.49 


19.42 


0.05 


9.06 


4.33 


187 


0.10 


16.13 






0.40 


6.67 


0.39 


11.10 


5.05 


4.91 


0.06 


20.24 






0.20 


1.67 


0.23 


1.18 






0.35 


11.35 






0.36 


18.35 


0.34 


6.91 






0.24 


6.78 


0.06 


8.37 


0.13 


12.80 
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Table 8 Kinetic studies on the recombinant GTases producing O-glucosides against 
lignin precursors 

A119 A407 

^2 ^max Km Vmax 

mM nkat/mg mM nkat/mg 

UDPG 0.93 _ 0.89 — 

ferulicacid 0.25 — — 

18.87 

sinapicacid 0.51 0.14 75.19 

131.58 

coniferyl 0.26 — — 

alcohol 92.59 

sinapyi alcohol 1.10 1.07 357.10 

322.58 



30 



WO 01/59140 



PCT/GB01/00477 



Table 9 

*H and I3 C NMR spectra were recorded in deuterated methanol at 500 MHz and 125 MHz 
respectively. Chemical shifts are given on S scale with TMS as internal standard. The position 
on the aromatic ring begins with the carbon joining the propanoic acid, d, doublet; dd % doublet 
of doublets: m, multiplet; J. coupling constant. 



Position 


CafFeic acid 






Caffeoy 1-3 -0-glucoside 
















8c 


CI 






128.1 






127.6 


C2 


7.02(lH,cU= 


2.0 Hz) 


115.2 


7.47 (1H, </,./= 2.0 Hz) 


117.0 


C3 


— 




146.7 






146.0 


C4 


— 




149.4 






150.6 


C5 


6.77 (1H, d,J= 


8.0 Hz) 


116.6 


6.84(lH,rf.J=8.5Hz) 


117.8 


C6 


6.92 (M.dd,J 


= 8.0, 2.0 


122.8 


7.13(lH,£«,y-8.5,2.0Hz) 


125.6 




Hz) 












C7 


7.53 (1H, 4.7= 


16.0 Hz) 


146.9 


7.45 (1H, 4./= 14.5 Hz) 


146.6 


C8 


6.21 (\H,d,J= 


15.5 Hz) 


116.3 


6.33 (1H, </,./= 14.5 Hz) 


116.1 


C9 






171.5 






170.4 


Glc-1 








~4.86 (signal interrupted) 


103.9 


Glc-2 












74.5 


Glc-3 










► 3.40-3.50 (4H, m) 


78.0 


Glc-4 










71.0 


Glc-5 












77.2 


Glc-6 








3.93 (1H, dd,J= 12.0, 2.0 Hz) 


62.4 










3.71 (\H,dd.J= 12.0,5.5 Hz) 
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Table 10 



Each assay contained 1 Ig of UGT71C1, 1 mM phenolic compound, 5 mM UDP-glucose, 1.4 
mM 2-mercaptoethanol and 50 mM TRIS-HCI, pH 7.0. The mix was incubated at 30 °C for 30 
min. The reaction was stopped by the addition of 20 Z I of trichloroacetic acid (240 mg/ml) 
and was analysed by reverse-phase HPLC subsequently. The results represent the mean of 
three replicates ± standard deviation. 



Substrate 



Specific activity 



o-Coumaric acid 

m-Coumaric acid 

/>-Coumaric acid 

Caffeic acid 

Ferulic acid 

Sinapic acid 

Escuietin 

Scopoletin 

Salicylic acid 

4-hydroxybenzoic acid 

3,4-dihydroxybenzoic acid 

Eriodictyol 

Luteolin 

Quercetin 

Catechin 

Cyanidin 



nkat/mg 
1.5 ±0.2 
1.2 ±0.2 
0 

2.9 ±0.8 



0 
0 



34.8 ± 4.2 
29.4 ±3.9 



0 
0 
0 
0 



0.7 ±0.1 
1.4 ±0.4 



0 
0 
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CLAIMS 



I. 



A transgenic plant comprising a nucleic acid molecule which encodes a polypeptide 



which: 



i) 



has GTase activity; 



10 



15 



20 



25 



30 



v) is selected from the group comprising: sequences of Figures 1 A, 2A, 3 A, 4A, 5 A, 6A, 
7A, 8A, 9A, 10A , 1 1A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
28,29,30,31,32. 

vi) nucleic acids which hybridise to the sequences represented in (ii) above; and 

vii) nucleic acid sequences which are degenerate as a result of the genetic code to the 
sequences defined in (ii) and (iii) above. 

2. A transgenic plant according to Claim 1 wherein the nucleic acid molecule anneals 
under stringent hybridisation conditions to the sequence presented in Figures 1 A, 2A, 3A, 4A, 
5A, 6A, 7A, 8A, 9A, 10A, 11A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
28,29,30,31,32. 

3. A transgenic plant according to Claim 1 or 2 wherein the nucleic acid molecule is 
selected from Figure 7A or 9A. 

4. A transgenic plant comprising a nucleic acid molecule wherein the nucleic acid 
molecule is selected from the group comprising: 

i) antisense sequences selected from the group comprising: sequences of Figures 
1C, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C and 1 1C or parts thereof, or antisense 
sequences of the sense sequences presented in Figs 12-32; 

ii) antisense sequences which will hybridise to the sense sequences according to any 
of Claims 1 -3 and which inhibit GTase activity. 

5. A transgenic plant according to Claim 4 wherein the antisense sequence is selected 
from Figure 7C or 9C. 

6. A transgenic plant according to any of Claims 1-5 wherein the nucleic acid is cDNA. 

7. A transgenic plant according to any of Claims 1-5 wherein the nucleic acid is 
genomic DNA. 
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8. A transgenic plant according to any of Claims 1-7 wherein the plant is a woody 
plant selected from: poplar; eucalyptus; Douglas fir, pine; walnut; ash; birch; oak; teak; 
spruce. 

5 9. A transgenic plant according toCIaim 8 wherein the woody plant is a plant used 
typically in the paper industry, for example poplar. 

10. A transgenic plant according to any of Claims 1-7 wherein the plant is a non-woody 
plant species. 

10 

1 1 A method for the manufacture of paper or board comprising: 

i) pulping transgenic wood material derived from a transgenic woody plant 
according to any of Claims 1-10; and 

ii) producing paper from said pulped transgenic wood material. 

12. Paper having the characteristics of paper manufactured by the method according to 
Claim 11. 

13. A product comprising the paper according to Claim 12. 

14. A transgenic eukaryotic cell comprising a nucleic acid molecule which encodes a 
polypeptide which: 

i) has GTase activity; 

ii) is selected from the group comprising sequences of Figures 1 A, 2A, 3A, 4A, 5A, 6A, 
7A, 8A, 9A, I OA, 1 1A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
28,29,30,31,32; 

iii) nucleic acids which hybridise to the sequences represented in (ii) above; and 

iv) nucleic acid sequences which are degenerate as a result of the genetic code to the 
sequences defined in (ii) and (iii) above. 

15 A transgenic eukaryotic ceil according to Claim 14 wherein the nucleic acid 
sequences is selected from: Figures 1A, 3 A. 4A, 5A, 8A, 10A, 11A, 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 3 1, 32. 

16 A transgenic eukaryotic cell according to Claim 1 5 wherein the nucleic acid 
sequence is presented in Figure 1 A, 3A, 4A, 5A,7A, 8A, 9A, 10A. 
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17 Use of the eukaryotic cell according to any of Claims 14 -16 for the glucosylation 
of: caffeic acid; luteolin; quercitin; catechin; syadinin. 

18. A transgenic prokaryotic cell comprising a nucleic acid molecule which encodes a 
5 polypeptide which: 

i) has GTase activity; 

ii) is selected from the group comprising sequences of Figures 1 A, 2A, 3 A, 4A, 5 A, 6A, 
7 A, 8A, 9A, I OA, 1 1A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
28,29,30,31,32; 

10 iii) nucleic acids which hybridise to the sequences represented in (ii) above; and 

nucleic acid sequences which are degenerate as a result of the genetic code to the sequences 
defined in (ii) and (iii) above. 

19 A transgenic prokaryotic cell according to Claim 1 8 wherein the nucleic acid sequences 
15 is selected from: Figures 1A, 3A, 4A, 5A, 8A, 10A, 1 1 A, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32. 

20 A transgenic prokaryotic cell according to Claim 19 wherein the nucleic acid sequence is 
presented in Figure 1 A, 3 A, 4A, 5A,7A, 8A, 9A, 10A. 

20 

21. Use of the prokaryotic cell according to any of Claims 18-20 for the glucosylation 
of: caffeic acid; luteolin; quercitin; catechin; syadinin. 
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FIGURE 1A AO 62 SENSE NUCLEOTIDE SEQUENCE 

1 ATGGCGCCAC CGCATTTTCT ACTGGTAACG TTTCCGGCGC AAGGTCACGT 

51 GAACCCATCT CTCCGTTTTG CTCGTCGGCT CATCAAAAGA ACCGGCGCAC 

101 GTGTCACTTT CGTCACTTGT GTCTCCGTCT TCCACAACTC CATGATCGCA 

151 AACCACAACA AAGTCGAAAA TCTCTCTTTC CTTACTTTCT CCG2CGGTTT 

201 CGACGATGGA GGCATTTCCA CCTACGAAGA CCGTCAGAAA AGGTCGGTGA 

251 ATCTCAAGGT TAACGGCGAT AAGGCACTAT CGGATTTCAT CGAAGCTACT 

301 AAGAATGGTG ACTCTCCCGT GACTTGCTTG ATCTACACGA TTCTTCTCAA 

351 TTGGGCTCCA AAAGTAGCAC GTAGATTTCA ACTTCCCTCC GCTCTTCTCT 

401 GGATCCAACC GGCTTTGGTT TTCAACATCT ATTACACTCA TTTCATGGGA 

451 AACAAGTCCG TTTTCGAGTT ACCTAATCTG TCTTCTCTGG AAATCAGAGA 

501 TCTTCCATCT TTCCTCACAC CTTCCAACAC AAACAAAGGC GCATACGATG 

551 CGTTTCAAGA AATGATGGAG TTTCTCATAA AAGAAACCAA ACCGAAAATT 

601 CTCATCAACA CTTTCGATTC GCTGGAACCA GAGGCCTTAA CGGCTTTCCC 

651 GAATATCGAT ATGGTGGCGG TTGGTCCTTT ACTTCCCACG GAGATTTTCT 

701 CAGGAAGCAC CAACAAATCA GTTAAAGATC AAAGTAGTAG TTATACACTT 

751 TGGCTAGACT CGAAAACAGA GTCCTCTGTT ATTTACGTTT CCTTTGGAAC 

801 AATGGTTGAG TTGTCCAAGA AACAGATAGA GGAACTAGCG AGAGCACTCA 

851 TAGAAGGGAA ACGACCGTTT TTGTGGGTTA TAACTGATAA ATCCAACAGA 

901 GAAACGAAAA CAGAAGGAGA AGAAGAGACA GAGATTGAGA AGATAGCTGG 

951 ATTCAGACAC GAGCTTGAAG AGGTTGGGAT GATTGTGTCG TGGTGTTCGC 

1001 AGATAGAGGT TTTAAGTCAC CGAGCCGTAG GTTGTTTTGT GACTCATTGT 

1051 GGGTGGAGCT CGACGCTGGA GAGTTTGGTT CTTGGCGTTC CGGTTGTGGC 

1101 GTTTCCGATG TGGTCGGATC AACCGACGAA CGCGAAGCTA CTGGAAGAAA 

1151 GTTGGAAGAC TGGTGTGAGG GTAAGAGAGA ACAAGGATGG TTTGGTGGAG 

1201 AGAGGAGAGA TCAGGAGGTG TTTGGAAGCC GTGATGGAGG AGAAGTCGGT 

1251 GGAGTTGAGG GAAAACGCAA AGAAATGGAA GCGTTTAGCG ATGGAAGCGG 

1301 GTAGAGAAGG AGGATCTTCG GATAAGAACA TGGAGGCTTT TGTGGAGGAT 

1351 ATTTGTGGAG AATCTCTTAT TCAAAACTTG TGTGAAGCAG AGGAGGTAAA 

1401 AGTAAAGTAA 
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FIGURE IB AO 62 AMINO ACID SEQUENCE 



1 


MAPPHFLLVT 


FPAQGHVNPS 


LRFARRLIKR TGARVTFVTC VSVFHNSMIA 


51 


NHNKVENLSF 


LTFSDGFDDG 


GISTYEDRQK RSVNLKVNGD KALSDFIEAT 


101 


KNGDSPVTCL 


IYTILLNWAP 


KVARRFQLPS ALLWIQPALV FNIYYTHFMG 


151 


NKSVFELPNL 


SSLEIRDLPS 


FLTPSNTNKG AYDAFQEMME FLIKETKPKI 


201 


LINTFDSLEP 


EALTAFPNID 


MVAVGPLLPT EIFSGSTNKS VKDQSSSYTL 


251 


WLDSKTESSV 


IYVSFGTMVE 


LSKKQIEELA RALIEGKRPF LWVITDKSNR 


301 


ETKTEGEEET 


EIEKIAGFRH 


ELEEVGMIVS WCSQIEVLSH RAVGCFVTHC 


351 


GWSSTLESLV 


LGVPWAFPM 


WSDQPTNAKL LEESWKTGVR VRENKDGLVE 


401 


RGEIRRCLEA 


VMEEKSVELR 


ENAKKWKRLA MEAGREGGSS DKNMEAFVED 


451 


ICGESLIQNL 


CEAEEVKVK 
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FIGURE iC A062 ANT X SENSE NUCLEOTIDE SEQUENCE 



1 


TTACTTTACT 


TTTACCTCCT 


CTGCTTCACA 


CAAGTTTTGA ATAAGAGATT 


51 


CTCCACAAAT 


ATCCTCCACA 


AAAGCCTCCA 


TGTTCTTATC CGAAGATCCT 


101 


CCTTCTCTAC 


CCGCTTCCAT 


CGCTAAACGC 


TTCCATTTCT TTGCGTTTTC 


151 


CCTCAACTCC 


ACCGACTTCT 


CCTCCATCAC 


GGCTTCCAAA CACCTCCTGA 


201 


TCTCTCCTCT 


CTCCACCAAA 


CCATCCTTGT 


TCTCTCTTAC CCTCACACCA 


251 


GTCTTCCAAC 


TTTCTTCCAG 


TAGCTTCGCG 


TTCGTCGGTT GATCCGACCA 


301 


CATCGGAAAC 


GCCACAACCG 


GAACGCCAAG 


AACCAAACTC TCCAGCGTCG 


351 


AGCTCCACCC 


ACAATGAGTC 


ACAAAACAAC 


CTACGGCTCG GTGACTTAAA 


401 


ACCTCTATCT GCGAACACCA CGACACAATC ATCCCAACCT CTTCAAGCTC 


451 


GTGTCTGAAT 


CCAGCTATCT 


TCTCAATCTC 


TGTCTCTTCT TCTCCTTCTG 


501 


TTTTCGTTTC 


TCTGTTGGAT 


TTATCAGTTA TAACCCACAA AAACGGTCGT 


551 


TTCCCTTCTA 


TGAGTGCTCT 


CGCTAGTTCC 


TCTATCTGTT TCTTGGACAA 


601 


CTCAACCATT 


GTTCCAAAGG 


AAACGTAAAT 


AACAGAGGAC TCTGTTTTCG 


651 


AGTCTAGCCA 


AAGTGTATAA 


CTACTACTTT 


GATCTTTAAC TGATTTGTTG 


701 


GTGCTTCCTG 


AGAAAATCTC 


CGTGGGAAGT 


AAAGGACCAA CCGCCACCAT 


751 


ATCGATATTC 


GGGAAAGCCG 


TTAAGGCCTC 


TGGTTCCAGC GAATCGAAAG 


801 


TGTTGATGAG 


AATTTTCGGT 


TTGGTTTCTT 


TTATGAGAAA CTCCATCATT 


851 


TCTTGAAACG 


CATCGTATGC 


GCGTTTGTTT 


GTGTTGGAAG GTGTGAGGAA 


901 


AGATGGAAGA 


TCTCTGATTT 


CCAGAGAAGA 


CAGATTAGGT AACTCGAAAA 


951 


CGGACTTGTT 


TCCCATGAAA 


TGAGTGTAAT 


AGATGTTGAA AACCAAAGCC 


1001 


GGTTGGATCC AGAGAAGAGC GGAGGGAAGT 


TGAAATCTAC GTGCTACTTT. 


1051 


TGGAGCCCAA 


TTGAGAAGAA 


TCGTGTAGAT 


CAAGCAAGTC ACGGGAGAGT 


1101 


CACCATTCTT 


AGTAGCTTCG 


ATGAAATCCG 


ATAGTGCGTT ATCGCCGTTA 


1151 


ACCTTGAGAT 


TCACCGACCT TTTCTGACGG 


TCTTCGTAGG TGGAAATGCC 


1201 


TCCATCGTCG 


AAACCGTCGG 


AGAAAGTAAG 


GAAAGAGAGA TTTTCGACTT 


1251 


TGTTGTGGTT 


TGCGATCATG 


GAGTTGTGGA . 


AGACGGAGAC ACAAGTGACG 


1301 


AAAGTGACAC 


GTGCGCCGGT 


TCTTTTGATG 


AGCCGACGAG CAAAACGGAG 


1351 


AGATGGGTTC ACGTGACCTT 


GCGCCGGAAA 


CGTTACCAGT AGAAAATGCG 


1401 


GTGGCGCCAT 
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FIGURE 2A A320 SENSE 


NUCLEOTIDE 


1 


ATGGAGCTAG 


AATCTTCTCC 


51 


TTTTCCAGGG 


CAAGGCCACG 


101 


TAGCTTCAAA 


GGGTTTGCTC 


151 


AAAAAGATGC 


GAATCTCCAA 


201 


TGGTAAAGGC 


TATCTCCGGT 


251 


ACGACGAAGG 


TAGCAGAACC 


301 


CTGGTCGGCA AAAGAGAGAT 


351 


AACGAAACAG 


CCCGTGACAT 


401 


TCTGTGACGT 


GGCAGAAGAT 


451 


CAATCTTGTG 


CCTGCTTAGC 


501 


TGACTTCCCG 


ACCAAAACAG 


551 


TGCCTCTCTT 


GAAACATGAC 


601 


CCTCACTCCG 


CTTTGCGAGA 


651 


CAAGACTTTC 


TCCATTTTCA 


701 


TCATTGACCA 


CATGTCGACG 


751 


GGACCACTCT ACAAAATGGC 


801 


AAACATCTCT 


GAGCCAACGG 


851 


CAGTTTCCTC 


CGTTGTTTAC 


901 


CAAGAACAAA 


TAGACGAGAT 


951 


GTTCTTGTGG 


GTGATTAGAC 


1001 


ATGTTTTGCC 


GGAAGAAGTT 


1051 


TCACAAGAGA 


AAGTATTATC 


1101 


CTGTGGATGG 


AACTCAACGA 


1151 


TTTGTTTTCC 


TCAATGGGGA 


1201 


GATGTTTGGA 


AGACGGGAGT 


1251 


GTTAGTGCCG 


AGGGAGGAAG 


1301 


GAGAGAAAGC 


GATCGAGTTG 


. 1351 


GCGGAGGCGG 


CGGTTGCTCG 


1401 


GTTTGTGGAG 


AAGTTGGGTG 


1451 


GTCATAATCA 


TGTCTTGGCT 



SEQUENCE 

TCCTCTACCT CCTCATGTGA TGCTCGTATC 
TTAATCCACT TCTTCGTCTT GGTAAGCTCT 
ATAACCTTCG TCACCACTGA GTCATGGGGC 
CAAAATCCAA GACCGTGTCC TCAAACCGGT 
ATGATTTCTT CGACGACGGG CTTCCTGAAG 
AACTTAACCA TCCTCCGACC ACATCTAGAG 
CAAGAACCTT GTGAAACGTT ACAAGGAAGT 
GTCTTATCAA CAACCCTTTC GTCTCTTGGG 
CTTCAAATCC CTTGTGCTGT TCTTTGGGTT 
TGCTTATTAC TATTACCACC ACAACCTAGT 
AACCCGAGAT CGATGTCCAA ATCTCTGGCA 
GAGATCCCTT CTTTCATTCA CCCTTCAAGT 
AGTGATCATA GATCAGATTA AACGGCTTCA 
TCGACACTTT CAACTCATTG GAGAAAGACA 
CTCTCTCTCC CCGGTGTTAT CAGACCGCTA 
TAAAACCGTA GCTTATGATG TCGTTAAAGT 
ATCCTTGCAT GGAGTGGTTA GACTCGCAGC 
ATCTCATTCG GGACCGTTGC TTACTTGAAA 
CGCTTACGGT GTGTTAAACG CCGACGTTAC 
AACAAGAGTT AGGTTTCAAC AAAGAGAAAC 
AAAGGGAAAG GGAAGATCGT TGAATGGTGT 
TCATCCTTCA GTGGCATGTT TCGTGACTCA 
TGGAAGCTGT GTCTTCCGGA GTCCCGACGG 
GATCAAGTCA CGGACGCCGT TTACATGATC 
GAGGCTAAGC CGTGGAGAGG CGGAGGAGAG 
TTGCGGAGAG GTTGAGAGAG GTTACTAAAG 
AAAAAGAATG CTTTGAAGTG GAAGGAAGAG 
CGGTGGTTCG TCGGATAGGA ATCTTGAAAA 
CCAAACCTGT GGGGAAAGTA CAAAACGGGA 
GGATCAATCA AAAGCTTTTA A 
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FIGURE 2B A320 AMINO ACID SEQUENCE 



1 


MELESSPPLP PHVMLVSFPG 


QGHVNPLLRL GKLLASKGLL 


ITFVTTESWG 


51 


KKMRISNKIQ DRVLKPVGKG 


YLRYDFFDDG LPEDDEASRT 


NLTILRPHLE 


101 


LVGKREIKNL VKRYKEVTKQ 


PVTCLINNPF VSWVCDVAED 


LQIPCAVLWV 


- 151 


QSCACLAAYY YYHHNLVDFP TKTEPEIDVQ ISGMPLLKHD EIPSFIHPSS 


201 


PHSALREVII DQIKRLHKTF 


SIFIDTFNSL EKDIIDHMST 


LSLPGVIRPL 


251 


GPLYKMAKTV AYDWKVNIS 


EPTDPCMEWL DSQPVSSWY 


ISFGTVAYLK 


301 


QEQIDEIAYG VLNADVTFLW VIRQQELGFN KEKHVLPEEV 


KGKGKIVEWC 


351 


SQEKVLSHPS VACFVTHCGW 


NSTMEAVSSG VPTVCFPQWG DQVTDAVYMI 


401 


DVWKTGVRLS RGEAEERLVP 


REEVAERLRE VTKGEKAIEL 


KKNALKWKEE 


451 


AEAAVARGGS SDRNLEKFVE 


KLGAKPVGKV QNGSHNHVLA 


GSIKSF 



r/rt 



WO 01/59140 



PCT/GB01/00477 



FIGURE 2C A320 ANT I SENSE NUCLEOTIDE SEQUENCE 



1 


TTAAAAGCTT TTGATTGATC CAGCCAAGAC 


ATGATTATGA 


CTCCCGTTTT 


51 


GTACTTTCCC CACAGGTTTG GCACCCAACT 


TCTCCACAAA 


CTTTTCAAGA 


101 


TTCCTATCCG ACGAACCACC GCGAGCAACC 


GCCGCCTCCG 


CCTCTTCCTT 


151 


CCACTTCAAA GCATTCTTTT TCAACTCGAT 


CGCTTTCTCT 


CCTTTAGTAA 


201 


CCTCTCTCAA CCTCTCCGCA ACTTCCTCCC 


TCGGCACTAA 


CCTCTCCTCC 


251 


GCCTCTCCAC GGCTTAGCCT CACTCCCGTC 


TTCCAAACAT 


CGATCATGTA 


301 


AACGGCGTCC GTGACTTGAT CTCCCCATTG 


AGGAAAACAA 


ACCGTCGGGA 


351 


CTCCGGAAGA CACAGCTTCC ATCGTTGAGT 


TCCATCCACA 


GTGAGTCACG 


401 


AAACATGCCA CTGAAGGATG AGATAATACT 


TTCTCTTGTG 


AACACCATTC 


451 


AACGATCTTC CCTTTCCCTT TAACTTCTTC 


CGGCAAAACA 


TGTTTCTPTT 


501 


TGTTGAAACC TAACTCTTGT TGTCTAATCA 


CCCACAAGAA 


CGTAACGTCG 


551 


GCGTTTAACA CACCGTAAGC GATCTCGTCT 


ATTTGTTCTT 


GTTTCAAGTA 


601 


AGCAACGGTC CCGAATGAGA TGTAAACAAC 


GGAGGAAACT 


GGCTGCGAGT 


651 


CTAACCACTC CATGCAAGGA TCCGTTGGCT 


CAGAGATGTT 


TACTTTAACG 

X w XXX fUAVVJ 


701 


ACATCATAAG CTACGGTTTT AGCCATTTTG 


TAGAGTGGTC 


CTAGCGGTCT 


751 


GATAACACCG GGGAGAGAGA GCGTCGACAT 


GTGGTCAATG 


ATGTCTTTCT 

• * X \J X \-* XXX *w X 


801 


CCAATGAGTT GAAAGTGTCG ATGAAAATGG 


AGAAAGTCTT 


GTGAAGCCGT 


851 


TTAATCTGAT CTATGATCAC TTCTCGCAAA 


GCGGAGTGAG 


GACTTGAAGfi 


901 


GTGAATGAAA GAAGGGATCT CGTCATGTTT 


CAAGAGAGGC 


ATGCPA^AGA 


951 


TTTGGACATC GATCTCGGGT TCTGTTTTGG 


TCGGGAAGTC 


AACTAGGTTG 


1001 


TGGTGGTAAT AGTAATAAGC AGCTAAGCAG 


GCACAAGATT 


G AACCC AAAG 


1051 


AACAGCACAA GGGATTTGAA GATCTTCTGC CAGGTCACAG 


ACCCAAGAGA 


1101 


CGAAAGGGTT GTTGATAAGA CATGTCACGG 


GCTGTTTCGT 


TACTTCCTTG 

* x • w x x \»»^ x x w 


1151 


TAACGTTTCA CAAGGTTCTT GATCTCTCTT TTGCCGACCA 


GCTCTAGATG 


1201 


TGGTCGGAGG ATGGTTAAGT TGGTTCTGCT 


AGCTTCGTCG 


TCTTCAGGAA 


1251 


GCCCGTCGTC GAAGAAATCA TACCGGAGAT 


AGCCTTTACC 


AACCGGTTTG 


1301 


AGGACACGGT CTTGGATTTT GTTGGAGATT 


CGCATCTTTT 


TGCCCCATGA 


1351 


CTCAGTGGTG ACGAAGGTTA TGAGCAAACC CTTTGAAGCT AAGAGCTTAC 


1401 


CAAGACGAAG AAGTGGATTA ACGTGGCCTT 


GCCCTGGAAA AGATACGAGC 


1451 


ATCACATGAG GAGGTAGAGG AGGAGAAGAT 


TCTAGCTCCA 


T 
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FIGURE 3 A A41 SENSE NUCLEOTIDE SEQUENCE 



1 


ATGGGATCCA 


TATCAGAAAT 


GGTGTTCGAA 


ACTTGTCCAT 


CTCCAAACCC 


51 


AATTCATGTA 


ATGCTCGTCT 


CGTTTCAAGG 


ACAAGGCCAC 


GTCAACCCTC 


101 


TTCTTCGTCT 


CGGCAAGTTA ATTGCTTCAA AGGGTTTACT 


CGTTACCTTC 


151 


GTTACAACGG 


AGCTTTGGGG 


CAAGAAAATG AGACAAGCCA 


ACAAAATCGT 


201 


TGACGGTGAA 


CTTAAACCGG 


TTGGTTCCGG 


TTCAATCCGG 


TTTGAGTTCT 


251 


TTGATGAAGA 


ATGGGCAGAG 


GATGATGACC 


GGAGAGCTGA 


TTTCTCTTTG 


301 


TACATTGCTC 


ACCTAGAGAG 


CGTTGGGATA 


CGAGAAGTGT 


CTAAGCTTGT 


351 


GAGAAGATAC 


GAGGAAGCGA ACGAGCCTGT 


CTCGTGTCTT 


ATCAATAACC 


401 


CGTTTATCCC 


ATGGGTCTGC 


CACGTGGCGG 


AAGAGTTCAA 


CATTCCTTGT 


451 


GCGGTTCTCT 


GGGTTCAGTC 


TTGTGCTTGT 


TTCTCTGCTT 


AT T AC CAT T A 


501 


CCAAGATGGC 


TCTGTTTCAT 


TCCCTACGGA 


AACAGAGCCT 


GAGCTCGATG 


551 


TGAAGCTTCC 


TTGTGTTCCT 


GTCTTGAAGA 


ACGACGAGAT 


TCCTAGCTTT 


601 


CTCCATCCTT 


CTTCTAGGTT 


CACGGGTTTT 


CGACAAGCGA 


TTCTTGGGCA 


651 


ATTCAAGAAT 


CTGAGCAAGT 


CCTTCTGTGT 


TCTAATCGAT 


TCTTTTGACT 


701 


CATTGGAACA 


AGAAGTTATC 


GATTACATGT 


CAAGTCTTTG 


TCCGGTTAAA 


751 


ACCGTTGGAC 


CGCTTTTCAA AGTTGCTAGG 


ACAGTTACTT 


CTGACGTAAG 


801 


CGGTGACATT 


TGCAAATCAA CAGATAAATG 


CCTCGAGTGG 


TTAGACTCGA 


851 


GGCCTAAATC 


GTCAGTTGTC 


TACATTTCGT 


TCGGGACAGT 


TGCATATTTG 


901 


AAGCAAGAAC 


AGATCGAAGA 


GATCGCTCAC 


GGAGTTTTGA 


AGTCGGGTTT 


951 


ATCGTTCTTG 


TGGGTGATTA 


GACCTCCACC 


ACACGATCTG 


AAGGTCGAGA 


1001 


CACATGTCTT 


GCCTCAAGAA 


CTTAAAGAGA 


GTAGTGCTAA 


AGGTAAAGGG 


1051 


ATGATTGTGG 


ATTGGTGCCC 


ACAAGAGCAA 


GTCTTGTCTC 


ATCCTTCAGT 


1101 


GGCATGCTTC 


GTGACTCATT 


GTGGATGGAA 


CTCGACAATG 


GAATCTTTGT 


1151 


CTTCAGGTGT 


TCCGGTGGTT 


TGTTGTCCGC 


AATGGGGAGA 


TCAAGTGACT 


1201 


GATGCAGTGT ATTTGATCGA TGTTTTCAAG ACCGGGGTTA 


GACTAGGCCG 


1251 


TGGAGCGACC 


GAGGAGAGGG 


TAGTGCCAAG 


GGAGGAAGTG 


GCGGAGAAGC 


1301 


TTTTGGAAGC 


GACAGTTGGG 


GAGAAGGCAG 


AGGAGTTGAG 


AAAGAACGCT 


1351 


TTGAAATGGA . 


AGGCGGAGGC 


GGAAGCAGCG 


GTGGCTCCAG 


GAGGTTCGTC 


1401 


GGATAAGAAT 


TTTAGGGAGT 


TTGTGGAGAA 


GTTAGGTGCG 


GGAGTAACGA 


1451 


AGACTAAAGA 


TAATGGATAC 


TAG 
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FIGURE 3B A41 AMINO ACID SEQUENCE 



1 


MVFETCPSPN 


PIHVMLVSFQ 


GQGHVNPLLR 


LGKLIASKGL 


LVTFVTTELW 


51 


GKKMRQANKI 


VDGELKPVGS 


GSIRFEFFDE 


EWAEDDDRRA 


DFSLYIAHLE 


101 


SVGIREVSKL 


VRRYEEANEP 


VSCLINNPFI 


PWVCHVAEEF NIPCAVLWVQ 


151 


SCACFSAYYH 


YQDGSVSFPT 


ETEPELDVKL 


PCVPVLKNDE 


IPSFLHPSSR 


201 


FTGFRQAILG 


QFKNLSKSFC 


VLIDSFDSLE 


QEVIDYMSSL 


CPVKTVGPLF 


251 


KVARTVTSDV 


SGDICKSTDK 


Jj Ci V¥ i_i u O C\ C I\ 


SSWYISFGT 


VAYLKQEQIE 


301 


EIAHGVLKSG. 


LSFLWVIRPP 


PHDLKVETHV 


LPQELKESSA 


KGKGMIVDWC 


351 


PQEQVLSHPS 


VACFVTHCGW 


NSTMESLSSG 


VPWCCPQWG 


DQVTDAVYLI 


401 


DVFKTGVRLG 


RGATEERWP 


REEVAEKLLE 


ATVGEKAEEL 


RKNALKWKAE 


451 


AEAAVAPGGS 


SDKNFREFVE 


KLGAGVTKTK 


DNGY 
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FIGURE 3C A41 ANTI SENSE NUCLEOTIDE SEQUENCE 

1 CTAGTATCCA TTATCTTTAG TCTTCGTTAC TCCCGCACCT AACTTCTCCA 
51 CAAACTCCCT AAAATTCTTA TCCGACGAAC CTCCTGGAGC CACCGCTGCT 
101 TCCGCCTCCG CCTTCCATTT CAAAGCGTTC TTTCTCAACT CCTCTGCCTT 
151 CTCCCCAACT GTCGCTTCCA AAAGCTTCTC CGCCACTTCC TCCCTTGGCA 
201 CTACCCTCTC CTCGGTCGCT CCACGGCCTA GTCTAACCCC GGTCTTGAAA 
251 ACATCGATCA AATACACTGC ATCAGTCACT TGATCTCCCC ATTGCGGACA 
301 ACAAACCACC GGAACACCTG AAGACAAAGA TTCCATTGTC GAGTTCCATC 
351 CACAATGAGT CACGAAGCAT GCCACTGAAG GATGAGACAA GACTTGCTCT 
401 TGTGGGCACC AATCCACAAT CATCCCTTTA CCTTTAGCAC TACTCTCTTT 
451 AAGTTCTTGA GGCAAGACAT GTGTCTCGAC CTTCAGATCG TGTGGTGGAG 
501 GTCTAATCAC CCACAAGAAC GATAAACCCG ACTTCAAAAC TCCGTGAGCG 
551 ATCTCTTCGA TCTGTTCTTG CTTCAAATAT GCAACTGTCC CGAACGAAAT 
601 GTAGACAACT GACGATTTAG GCCTCGAGTC TAACCACTCG AGGCATTTAT 
651 CTGTTGATTT GCAAATGTCA CCGCTTACGT CAGAAGTAAC TGTCCTAGCA 
701 ACTTTGAAAA GCGGTCCAAC GGTTTTAACC GGACAAAGAC TTGACATGTA 
751 ATCGATAACT TCTTGTTCCA ATGAGTCAAA AGAATCGATT AGAACACAGA 
801 AGGACTTGCT CAGATTCTTG AATTGCCCAA GAATCGCTTG TCGAAAACCC 
851 GTGAACCTAG AAGAAGGATG GAGAAAGCTA GGAATCTCGT CGTTCTTCAA 
901 GACAGGAACA CAAGGAAGCT TCACATCGAG CTCAGGCTCT GTTTCCGTAG 
95 i GGAATGAAAC AGAGCCATCT TGGTAATGGT AATAAGCAGA GAAACAAGCA 
1001 CAAGACTGAA CCCAGAGAAC CGCACAAGGA ATGTTGAACT CTTCCGCCAC 
1051 GTGGCAGACC CATGGGATAA ACGGGTTATT GATAAGACAC GAGACAGGCT 
1101 CGTTCGCTTC CTCGTATCTT CTCACAAGCT TAGACACTTC TCGTATCCCA 
1151 ACGCTCTCTA GGTGAGCAAT GTACAAAGAG AAATCAGCTC T'CCGGTCATC 
1201 ATCCTCTGCC CATTCTTCAT CAAAGAACTC AAACCGGATT GAACCGGAAC 
1251 CAACCGGTTT AAGTTCACCG TCAACGATTT TGTTGGCTTG TCTCATTTTC 
1301 TTGCCCCAAA GCTCCGTTGT AACGAAGGTA ACGAGTAAAC CCTTTGAAGC 
1351 AATTAACTTG CCGAGACGAA GAAGAGGGTT GACGTGGCCT TGTCCTTGAA 
1401 ACGAGACGAG CATTACATGA ATTGGGTTTG GAGATGGACA AGTTTCGAAC 
1451 ACCATTTCTG ATATGGATCC CAT 
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FIGURE 4 A A42 SENSE NUCLEOTIDE SEQUENCE 



1 


ATGGACCCGT CTCGTCATAC 


TCATGTGATG 


CTCGTATCTT 


TCCCCGGCCA 


51 


AGGTCACGTA AACCCTCTAC 


TTCGTCTCGG 


AAAGCTCATA 


GCCTCTAAAG 


101 


GCTTACTCGT CACCTTTGTC 


ACCACAGAGA 


AGCCATGGGG 


CAAGAAGATG 


151 


CGTCAAGCCA ACAAGATTCA 


AGACGGTGTG 


CTCAAACCGG 


TCGGTCTAGG 


201 


TTTCATCCGG TTTGAGTTCT 


TCTCTGACGG 


CTTCGCCGAC 


GACGATGAAA 


251 


AAAGATTCGA CTTCGATGCC 


TTCCGACCAC 


ACCTTGAAGC 


TGTCGGAAAA 


301 


CAAGAGATCA AGAATCTCGT 


TAAGAGATAT 


AACAAGGAGC CGGTGACGTG 


351 


TCTCATAAAC AACGCTTTTG 


TCCCATGGGT 


ATGTGATGTC 


GCCGAGGAGC 


401 


TTCACATCCC TTCGGCTGTT 


CTATGGGTCC 


AGTCTTGTGC 


TTGTCTCACG 


451 


GCTTATTACT ATTACCACCA CCGGTTAGTT AAGTTCCCGA CCAAAACCGA 


501 


GCCGGACATC AGCGTTGAAA TCCCTTGCTT GCCATTGTTA AAGCATGACG 


551 


AGATCCCAAG CTTTCTTCAC 


CCTTCGTCTC 


CGTATACAGC 


TTTTGGAGAT 


601 


ATCATTTTAG ACCAGTTAAA 


GAGATTCGAA 


AACCACAAGT 


CTTTCTATCT 


651 


TTTCATCGAC ACTTTTCGCG 


AACTAGAAAA 


AGACATCATG 


GACGACATGT 


701 


CACAACTTTG TCCTCAAGCC 


ATCATCAGTC 


CTGTCGGTCC 


GCTCTTCAAG 


751 


ATGGCTCAAA CCTTGAGTTC 


TGACGTTAAG 


GGAGATATAT 


CCGAGCCAGC 


801 


GAGTGACTGC ATGGAATGGC 


TTGACTCAAG 


AGAACCATCC 


TCAGTCGTTT 


851 


ACATCTCCTT TGGGACTATA 


GCCAACTTGA 


AGCAAGAGCA 


GATGGAGGAG 


901 


ATCGCTCATG GCGTTTTGAG 


CTCTGGCTTG 


TCGGTCTTAT 


GGGTGGTTCG 


951 


GCCTCCCATG GAAGGGACAT 


TTGTAGAACC 


ACATGTTTTG 


CCTCGAGAGC 


1001 


TCGAAGAAAA GGGTAAAATC 


GTGGAATGGT 


GTCCCCAAGA GAGAGTCTTG 


1051 


GCTCATCCTG CGATTGCTTG 


TTTCTTAAGT 


CACTGCGGAT 


GGAACTCGAC 


1101 


AATGGAGGCT TTAACTGCCG 


GAGTCCCCGT 


TGTTTGTTTT 


CCGCAATGGG 


1151 


GAGATCAAGT GACTGATGCG 


GTGTACTTGG 


CTGATGTTTT 


CAAGACAGGA 


1201 


GTGAGACTAG GCCGCGGAGC 


CGCTGAGGAG 


ATGATTGTTT 


CGAGGGAGGT 


1251 


TGTAGCAGAG AAGCTGCTTG AGGCCACAGT 


TGGGGAAAAG 


GCGGTGGAGC 


1301 


TGAGAGAAAA CGCTCGGAGG 


TGGAAGGCGG 


AGGCCGAGGC 


CGCCGTGGCG 


1351 


GACGGTGGAT CATCTGATAT 


GAACTTTAAA GAGTTTGTGG ACAAGTTGGT 


1401 


TACGAAACAT GTGACGAGAG AAGACAACGG AGAACACTAG 
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FIGURE 4B A42 AMINO ACID SEQUENCE 



1 


MDPSRHTHVM 


LVSFPGQGHV NPLLRLGKLI 


ASKGLLVTFV 


TTEKPWGKKM 


51 


RQANKIQDGV 


LKPVGLGFIR 


FEFFSDGFAD 


DDEKRFDFDA 


FRPHLEAVGK 


101 


QEIKNLVKRY 


NKEPVTCLIN 


NAFVPWVCDV 


AEELHIPSAV 


LWVQSCACLT 


151 


AYYYYHHRLV 


KFPTKTEPDI 


SVEIPCLPLL 


KHDEIPSFLH 


PSSPYTAFGD 


201 


IILDQLKRFE 


NHKSFYLFID 


TFRELEKDIM 


DHMSQLCPQA 


IISPVGPLFK 


251 


MAQTLSSDVK 


GDISEPASDC 


MEWLDSREPS 


SWYISFGTI 


ANLKQEQMEE 


301 


IAHGVLSSGL 


SVLWWRPPM 


EGTFVEPHVL 


PRELEEKGKI 


VEWCPQERVL 


351 


AHPAIACFLS 


HCGWNSTMEA 


LTAGVPWCF 


PQWGDQVTDA VYLADVFKTG 


401 


VRLGRGAAEE 


MIVSREWAE 


KLLEATVGEK 


AVELRENARR 


WKAEAEAAVA 


451 


DGGSSDMNFK 


EFVDKLVTKH 


VTREDNGEH 
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FIGURE 4C A42 ANTISENSE NUCLEOTIDE SEQUENCE 



1 


CTAGTGTTCT 


CCGTTGTCTT 


CTCTCGTCAC 


ATGTTTCGTA 


ACCAACTTGT 


51 


CCACAAACTC 


TTTAAAGTTC 


ATATCAGATG 


ATCCACCGTC 


CGCCACGGCG 


101 


GCCTCGGCCT 


CCGCCTTCCA 


CCTCCGAGCG 


TTTTCTCTCA 


GCTCCACCGC 


151 


CTTTTCCCCA ACTGTGGCCT 


CAAGCAGCTT 


CTCTGCTACA 


ACCTCCCTCG 


201 


AAACAATCAT 


CTCCTCAGCG 


GCTCCGCGGC 


CTAGTCTCAC 


TCCTGTCTTG 


251 


AAAACATCAG 


CCAAGTACAC 


CGCATCAGTC 


ACTTGATCTC 


CCCATTGCGG 


301 


AAAACAAACA ACGGGGACTC 


CGGCAGTTAA 


AGCCTCCATT 


GTCGAGTTCC 


351 


ATCCGCAGTG 


ACTTAAGAAA 


CAAGCAATCG 


CAGGATGAGC 


CAAGACTCTC 


401 


TCTTGGGGAC 


ACCATTCCAC 


GATTTTACCC 


TTTTCTTCGA 


GCTCTCGAGG 


451 


CAAAACATGT 


GGTTCTACAA ATGTCCCTTC 


CATGGGAGGC 


CGAACCACCC 


501 


ATAAGACCGA 


CAAGCCAGAG 


CTCAAAACGC 


CATGAGCGAT 


CTCCTCCATC 


551 


TGCTCTTGCT 


TCAAGTTGGC 


TATAGTCCCA 


AAGGAGATGT 


AAACGACTGA 


601 


GGATGGTTCT 


CTTGAGTCAA 


GCCATTCCAT 


GCAGTCACTC 


GCTGGCTCGG 


651 


ATATATCTCC 


CTTAACGTCA 


GAACTCAAGG 


TTTGAGCCAT 


CTTGAAGAGC 


701 


GGACCGACAG 


GACTGATGAT 


GGCTTGAGGA 


CAAAGTTGTG 


ACATGTGGTC 


751 


CATGATGTCT 


TTTTCTAGTT 


CGCGAAAAGT 


GTCGATGAAA 


AGATAGAAAG 


801 


ACTTGTGGTT 


TTCGAATCTC 


TTTAACTGGT 


CTAAAATGAT 


ATCTCCAAAA 


851 


GCTGTATACG 


GAGACGAAGG 


GTGAAGAAAG 


CTTGGGATCT 


CGTCATGCTT 


901 


TAACAATGGC 


AAGCAAGGGA 


TTTCAACGCT 


GATGTCCGGC 


TCGGTTTTGG 


951 


TCGGGAACTT 


AACTAACCGG 


TGGTGGTAAT 


AGTAATAAGC 


CGTGAGACAA 


1001 


GCACAAGACT GGACCCATAG AACAGCCGAA 


GGGATGTGAA 


GCTCCTCGGC 


1051 


G AC AT C AC AT 


ACCCATGGGA 


CAAAAGCGTT 


GTTTATGAGA 


CACGTCACCG 


1101 


GCTCCTTGTT 


ATATCTCTTA ACGAGATTCT 


TGATCTCTTG 


TTTTCCGACA 


1151 


GCTTCAAGGT GTGGTCGGAA GGCATCGAAG 


TCGAATCTTT 


TTTCATCGTC 


X C \J J. 


GTCGGCGAAG CCGTCAGAGA AGAACTCAAA 


CCGGATGAAA 


CCTAGACCGA 


1251 


CCGGTTTGAG 


CACACCGTCT 


TGAATCTTGT 


TGGCTTGACG 


CATCTTCTTG 


1301 


CCCCATGGCT 


TCTCTGTGGT 


GACAAAGGTG 


ACGAGTAAGC 


CTTTAGAGGC 


1351 


TATGAGCTTT 


CCGAGACGAA 


GTAGAGGGTT 


TACGTGACCT 


TGGCCGGGGA 


1401 


AAGATACGAG 


CATCACATGA 


GTATGACGAG 


ACGGGTCCAT 
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FIGURE 5A A43 SENSE NUCLEOTIDE SEQUENCE 



1 


ATGGAGATGG AATCGTCGTT 


ACCTCATGTG 


ATGCTCGTAT 


CATTCCCAGG 


51 


GCAAGGTCAC 


ATAAGCCCTC 


TTCTTCGTCT 


CGGAAAGATC 


ATTGCCTCTA 


101 


AAGGCTTAAT 


CGTCACCTTT 


GTAACCACAG 


AGGAACCATT 


GGGCAAGAAG 


151 


ATGCGTCAAG 


CCAACAATAT 


TCAAGACGGT 


GTGCTCAAAC 


CGGTCGGGCT 


201 


AGGTTTTCTC 


CGGTTCGAGT 


TCTTCGAGGA 


TGGATTTGTC 


TACAAAGAAG 


251 


ACTTTGATTT 


GTTACAAAAA 


TCACTTGAAG 


TTTCCGGAAA 


ACGAGAGATC 


301 


AAGAATCTTG 


TCAAGAAATA 


TGAGAAGCAA 


CCAGTGAGAT 


GTCTCATAAA 


351 


TAATGCCTTT 


GTTCCATGGG 


TTTGTGACAT 


AGCCGAGGAG 


CTTCAAATCC 


401 


CATCAGCTGT 


TCTTTGGGTC 


CAGTCTTGTG 


CTTGCCTCGC 


CGCTTATTAC 


451 


TATTACCACC 


ACCAGTTAGT 


TAAGTTTCCG 


ACCGAAACCG 


AGCCGGAAAT 


501 


AACCGTTGAC GTCCCTTTCA AGCCATTAAC 


ATTGAAGCAT 


GACGAGATCC 


551 


CTAGCTTTCT 


TCACCCTTCC 


TCTCCGCTGT 


CCTCTATAGG 


AGGTACCATT 


601 


TTAGAGCAGA 


TCAAGCGACT 


TCACAAGCCT 


TTCTCTGTTC 


TCATCGAAAC 


651 


TTTTCAAGAA 


CTTGAAAAAG 


ATACCATTGA 


CCACATGTCC 


CAGCTCTGCC 


701 


CTCAAGTCAA 


CTTCAACCCC 


ATCGGTCCGC 


TTTTTACTAT 


GGCTAAAACC 


751 


ATAAGGTCTG 


ACATCAAGGG 


AGACATCTCC 


AAGCCAGATA GTGACTGCAT 


801 


AGAGTGGCTT 


GACTCGAGAG 


AACCATCCTC 


CGTTGTTTAC 


ATCTCTTTTG 


851 


GGACTTTGGC 


TTTCTTGAAG 


CAAAACCAGA 


TCGACGAGAT 


TGCTCACGGC 


901 


ATTCTCAACT 


CCGGGTTGTC 


CTGCTTATGG 


GTTTTGCGGC 


CTCCCTTAGA 


951 


AGGCTTAGCC 


ATAGAACCGC 


ATGTCTTGCC 


TCTAGAGCTT 


GAAGAGAAAG 


1001 


GGAAGATTGT 


GGAATGGTGT 


CAACAAGAGA 


AAGTTTTGGC 


TCATCCTGCG 


1051 


GTTGCTTGCT 


TCTTAAGTCA 


CTGTGGATGG 


AACTCAACCA 


TGGAGGCTTT 


1101 


AACTTCAGGA 


GTTCCCGTTA 


TTTGTTTCCC 


GCAGTGGGGA GATCAGGTGA 


1151 


CAAATGCGGT 


GTACATGATT 


GATGTTTTCA 


AGACAGGATT 


GAGACTCAGC 


1201 


CGTGGAGCTT 


CCGATGAGAG 


GATTGTTCCA 


AGGGAGGAGG 


TTGCTGAGCG 


1251 


ACTGCTTGAG 


GCCACCGTTG 


GAGAGAAGGC 


GGTGGAGCTG 


AGAGAAAACG 


1301 


CTCGGAGGTG 


GAAGGAGGAG 


GCGGAGTCTG 


CCGTGGCTTA 


CGGTGGAACA 


1351 


TCGGAAAGGA ATTTTCAAGA GTTTGTTGAC AAGTTGGTTG ATGTCAAGAC 


1401 


AATGACAAAC 


ATTAATAATG 


TCGTGTAAGT 
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FIGURE 5B A43 AMINO ACID SEQUENCE 



1 


MEMESSLPHV 


MLVSFPGQGH 


ISPLLRLGKI 


IASKGLIVTF 


VTTEEPLGKK 


51 


MRQANNIQDG 


VLKPVGLGFL 


RFEFFEDGFV 


YKEDFDLLQK 


SLEVSGKREI 


101 


KNLVKKYEKQ 


PVRCLINNAF 


VPWVCDIAEE 


LQIPSAVLWV 


QSCACLAAYY 


"151 


YYHHQLVKFP 


TETEPEITVD 


VPFKPLTLKH 


DEIPSFLHPS 


SPLSSIGGTI 


201 


LEQIKRLHKP 


FSVLIETFQE 


LEKDTIDHMS 


QLCPQVNFNP 


IGPLFTMAKT 


251 


IRSDIKGDIS 


KPDSDCIEWL 


DSREPSSWY 


ISFGTLAFLK QNQIDEIAHG 


301 


ILNSGLSCLW 


VLRPPLEGLA 


IEPHVLPLEL 


EEKGKIVEWC 


QQEKVLAHPA 


351 


VACFLSHCGW NSTMEALTSG VPVICFPQWG 


DQVTNAVYMI 


DVFKTGLRLS 


401 


RGASDERIVP 


REEVAERLLE 


ATVGEKAVEL 


RENARRWKEE 


AESAVAYGGT 


451 


SERNFQEFVD KLVDVKTMTN 


INNW 
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FIGURE 5C A43 ANTISENSE NUCLEOTIDE SEQUENCE 



1 


ACTTACACGA CATTATTAAT 


GTTTGTCATT 


GTCTTGACAT 


CAACCAACTT 


51 


GTCAACAAAC TCTTGAAAAT 


TCCTTTCCGA 


TGTTCCACCG 


TAAGCCACGG 


101 


CAGACTCCGC CTCCTCCTTC 


CACCTCCGAG 


CGTTTTCTCT 


CAGCTCCACC 


151 


GCCTTCTCTC CAACGGTGGC 


CTCAAGCAGT 


CGCTCAGCAA 


CCTCCTCCCT 


201 


TGGAACAATC CTCTCATCGG 


AAGCTCCACG 


GCTGAGTCTC 


AATCCTGTCT 


251 


TGAAAACATC AATCATGTAC 


ACCGCATTTG 


TCACCTGATC 


TCCCCACTGC 


301 


GGGAAACAAA TAACGGGAAC 


TCCTGAAGTT 


AAAGCCTCCA 


TGGTTGAGTT 


351 


CCATCCACAG TGACTTAAGA 


AGCAAGCAAC 


CGCAGGATGA GCCAAAACTT 


401 


TCTCTTGTTG ACACCATTCC ACAATCTTCC CTTTCTCTTC AAGCTCTAGA 


451 


GGCAAGACAT GCGGTTCTAT 


GGCTAAGCCT 


TCTAAGGGAG 


GCCGCAAAAC 


501 


CCATAAGCAG GACAACCCGG 


AGTTGAGAAT 


GCCGTGAGCA ATCTCGTCGA 


551 


TCTGGTTTTG CTTCAAGAAA 


GCCAAAGTCC 


CAAAAGAGAT 


GTAAACAACG 


601 


GAGGATGGTT CTCTCGAGTC 


AAGCCACTCT 


ATGCAGTCAC 


TATCTGGCTT 


651 


GGAGATGTCT CCCTTGATGT 


CAGACCTTAT 


GGTTTTAGCC 


ATAGTAAAAA 


701 


GCGGACCGAT GGGGTTGAAG 


TTGACTTGAG 


GGCAGAGCTG 


GGACATGTGG 


751 


TCAATGGTAT CTTTTTCAAG 


TTCTTGAAAA 


GTTTCGATGA GAACAGAGAA 


801 


AGGCTTGTGA AGTCGCTTGA TCTGCTCTAA 


AATGGTACCT 


CCTATAGAGG 


851 


ACAGCGGAGA GGAAGGGTGA 


AGAAAGCTAG 


GGATCTCGTC 


ATGCTTCAAT 


901 


GTTAATGGCT TGAAAGGGAC 


GTCAACGGTT 


ATTTCCGGCT 


CGGTTTCGGT 


951 


CGGAAACTTA ACTAACTGGT 


GGTGGTAATA 


GTAATAAGCG 


GCGAGGCAAG 


1001 


CAGAAGACTG GACCCAAAGA 


ACAGCTGATG 


GG AT TTGAAG 


CTCCTCGGCT 


1051 


ATGTCACAAA CCCATGGAAC 


AAAGGCATTA 


TTTATGAGAC 


ATCTCACTGG 


1101 


TTGCTTCTCA TATTTCTTGA 


CAAGATTCTT 


GATCTCTCGT 


TTTCCGGAAA 


1151 


CTTCAAGTGA TTTTTGTAAC 


AAATCAAAGT 


CTTCTTTGTA GACAAATCCA 


1201 


TCCTCGAAGA ACTCGAACCG GAGAAAACCT 


AGCCCGACCG 


GTTTGAGCAC 


.1251 


ACCGTCTTGA ATATTGTTGG 


CTTGACGCAT 


CTTCTTGCCC 


AATGGTTCCT 


1301 


CTGTGGTTAC AAAGGTGACG 


ATTAAGCCTT 


TAGAGGCAAT 


GATCTTTCCG 


1351 


AGACGAAGAA GAGGGCTTAT GTGACCTTGC CCTGGGAATG ATACGAGCAT 


1401 


CACATGAGGT AACGACGATT 


CCATCTCCAT 
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FIGURE 6A A911 SENSE NUCLEOTIDE SEQUENCE 



1 


ATGGGCAGTA 


GTGAGGGTCA 


AGAAACACAT 


GTCCTAATGG 


TAACACTACC 


51 


ATTCCAAGGT 


CACATCAATC 


CAATGCTCAA 


ACTCGCAAAA 


CATCTCTCGT 


101 


TATCATCAAA 


GAACCTACAC 


ATCAATCTCG 


CCACTATTGA 


GTCAGCCCGT 


151 


GATCTCCTCT 


CCACCGTAGA 


AAAACCTCGT 


TA7CCGGTGG 


ACCTCGTGTT 


201 


CTTCTCCGAT 


GGTCTACCTA 


AAGAAGATCC 


AAAGGCCCCT 


GAAACTCTTT 


251 


TGAAGTCATT 


GAATAAAGTC 


GGAGCCATGA 


ACTTGTCTAA 


AATCATCGAA 


301 


GAAAAGAGAT 


ACTCTTGTAT 


CATCTCTTCG 


CCTTTTACtC 


CATGGGTTCC 


351 


AGCTGTTGCA 


GCCTCTCATA 


ACATCTCTTG 


TGCAATACTT 


TGGATCCAAG 


401 


CTTGTGGAGC 


TTACTCGGTT 


TATTACCGTT 


ACTACATGAA 


GACAAACTCT 


451 


TTCCCTGATC 


TTGAAGATCT 


GAATCAAACG 


GTGGAGTTAC 


CAGCTTTACC 


501 


ATTGTTGGAA 


GTTCGAGATC 


TTCCATCGTT 


TATGTTACCT 


TCTGGTGGTG 


551 


CTCACTTCTA 


TAATCTAATG 


GCGGAATTTG 


CAGATTGTTT 


GAGGTATGTG 


601 


AAATGGGTTT 


TGGTTAATTC 


ATTCTATGAA 


CTCGAATCAG 


AGATAATCGA 


651 


ATCGATGGCT 


GATTTAAAAC 


CTGTAATTCC 


AATTGGTCCT 


CTGGTTTCTC 


701 


CATTTCTGTT 


GGGCGATGGT 


GAGGAGGAAA 


CCCTAGACGG 


TAAAAACCTA 


751 


GATTTTTGTA 


AATCTGATGA 


TTGTTGTATG 


GAGTGGCTTG 


ACAAGCAAGC 


801 


TAGGTCTTCT 


GTTGTGTACA 


TATCTTTCGG 


AAGTATGCTC 


GAAACATTGG 


851 


AGAATCAGGT 


CGAGACCATA 


GCGAAGGCGC 


TGAAGAACAG 


AGGACTTCCA 


901 


TTTCTTTGGG 


TGATAAGGCC 


AAAGGAGAAA 


GCCCAAAACG 


TTGCTGTTTT 


951 


GCAGGAGATG 


GTGAAAGAAG 


GACAAGGGGT 


TGTTCTCGAG 


TGGAGTCCAC 


1001 


AAGAGAAGAT 


TTTGAGCCAC 


GAGGCAATCT 


CTTGTTTTGT 


CACGCATTGC 


1051 


GGCTGGAACT 


CGACTATGGA 


GACGGTGGTG 


GCTGGTGTTC 


CTGTGGTAGC 


1101 


GTACCCTAGC 


TGGACGGATC 


AGCCCATTGA. 


CGCGCGGTTG 


CTtGTTGATG 


1151 


TGTTTGGAAT 


CGGAGTAAGG 


ATGAGGAATG 


ACAGTGTCGA 


TGGCGAGCTT 


1201 


AAGGTCGAAG 


AAGTAGAAAG 


ATGCATTGAG 


GCCGTGACGG 


AGGGACCCGC 


1251 


TGCCGTGGAT 


ATAAGAAGGA 


GAGCGGCGGA 


GCTAAAGCGC 


GTGGCGAGAT 


1301 


TGGCGTTGGC 


ACCTGGTGGA 


TCTTCGACAC 


GGAATTTAGA 


CTTGTTCATT 


1351 


AGTGATATCA 


CAATCGCCTA 


ACTCTTTACT 


TCAACTAGTA 


CAAAATGTAT 


1401 


GAATACAAGG 


TTTGATATAA 


CCACTATCAA 


TTGTTAG 
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FIGURE 6B A911 AMINO ACID SEQUENCE 



1 


MGSSEGQETH VLMVTLPFQG HINPMLKLAK 


HLSLSSKNLH 


INLATIESAR 


51 


DLLSTVEKPR YPVDLVFFSD GLPKEDPKAP 


ETLLKSLNKV 


GAMNLSKIIE 


101 


EKRYSCIISS PFTPWVPAVA ASHNISCAIL 


WIQACGAYSV 


YYRYYMKTNS 


151 


r PDLEDLNQT VELPALPLLE VRDLPSFMLP 


SGGAHFYNLM AEFADCLRYV 


201 


KWVLVNSFYE LESEIIESMA DLKPVIPIGP 


LVSPFLLGDG 


EEETLDGKNL 


251 


DFCKSDDCCM EWLDKQARSS WYISFGSML 


ETLENQVETI 


AKALKNRGLP 


301 


FLWVIRPKEK AQNVAVLQEM VKEGQGWLE 


WSPQEKILSH EAISCFVTHC 


351 


GWNSTMETVV AGVPWAYPS WTDQPIDARL 


LVDVFGIGVR 


MRNDSVDGEL 


401 


KVEEVERCIE AVTEGPAAVD IRRRAAELKR 


VARLALAPGG 


SSTRNLDLFI 


451 


SDITIA 







WO 01/59140 



PCT/GB01/00477 



FIGURE 6C A911 ANTI SENSE NUCLEOTIDE SEQUENCE 



1 


CTAACAATTG 


ATAGTGGTTA 


TATCAAACCT 


TGTATTCATA 


CATTTTGTAC 


51 


TAGTTGAAGT 


AAAGAGTTAG 


GCGATTGTGA 


TATCACTAAT 


GAACAAGTCT 


101 


AAATTCCGTG 


TCGAAGATCC 


ACCAGGTGCC 


AACGCCAATC 


TCGCCACGCG 


151 


CTTTAGCTCC 


GCCGCTCTCC 


TTCTTATATC 


CACGGCAGCG 


GGTCCCTCCG 


201 


TCACGGCCTC 


AATGCATCTT 


TCTACTTCTT 


CGACCTTAAG 


CTCGCCATCG 


251 


ACACTGTCAT 


TCCTCATCCT 


TACTCCGATT 


CCAAACACAT 


CAACAAGCAA 


301 


CCGCGCGTCA 


ATGGGCTGAT 


CCGTCCAGCT 


AGGGTACGCT 


ACCACAGGAA 


351 


CACCAGCCAC 


CACCGTCTCC 


ATAGTCGAGT 


TCCAGCCGCA 


ATGCGTGACA 


401 


AAACAAGAGA 


TTGCCTCGTG 


GCTCAAAATC 


TTCTCTTGTG 


GACTCCACTC 


451 


GAGAACAACC 


CCTTGTCCTT 


CTTTCACCAT 


CTCCTGCAAA 


ACAGCAACGT 


501 


TTTGGGCTTT CTCCTTTGGC CTTATCACCC AAAGAAATGG AAGTCCTCTG 


551 


TTCTTCAGCG 


CCTTCGCTAT 


GGTCTCGACC 


TGATTCTCCA 


ATGTTTCGAG 


601 


CATACTTCCG 


AAAGATATGT 


ACACAACAGA 


AGACCTAGCT 


TGCTTGTCAA 


651 


GCCACTCCAT 


ACAACAATCA 


TGAGATTTAC 


AAAAATCTAG 


GTTTTTACCG 


701 


TCTAGGGTTT 


CCTCCTCACC 


ATCGCCCAAC 


AGAAATGGAG 


AAACCAGAGG 


751 


ACCAATTGGA 


ATTACAGGTT 


TTAAATCAGC 


CATCGATTCG 


ATTATCTCTG 


801 


ATTCGAGTTC 


ATAGAATGAA TTAACCAAAA CCCATTTCAC ATACCTCAAA 


851 


CAATCTGCAA 


ATTCCGCCAT 


TAGATTATAG 


AAGTGAGCAC 


CACCAGAAGG 


901 


TAACATAAAC 


GATGGAAGAT 


CTCGAACTTC 


CAACAATGGT 


AAAGCTGGTA 


951 


ACTCCACCGT 


TTGATTCAGA 


TCTTCAAGAT 


CAGGGAAAGA 


GTTTGTCTTC 


1001 


ATGTAGTAAC 


GGTAATAAAC 


CGAGTAAGCT 


CCACAAGCTT 


GGATCCAAAG 


1051 


TATTGCACAA 


GAGATGTTAT 


GAGAGGCTGC 


AACAGCTGGA 


ACCCATGGAG 


1101 


TAAAAGGCGA 


AGAGATGATA 


CAAGAGTATC 


TCTTTTCTTC 


GATGATTTTA 


1151 


GACAAGTTCA 


TGGCTCCGAC 


TTTATTCAAT 


GACTTCAAAA 


GAGTTTCAGG 


1201 


GGCCTTTGGA 


TCTTCTTTAG 


GTAGACCATC 


GGAGAAGAAC 


ACGAGGTCCA 


1251 


CCGGATAACG 


AGGTTTTTCT ACGGTGGAGA GGAGATCACG GGCTGACTCA 


1301 


ATAGTGGCGA 


GATTGATGTG 


TAGGTTCTTT 


GATGATAACG 


AGAGATGTTT 


1351 


TGCGAGTTTG 


AGCATTGGAT 


TGATGTGACC 


TTGGAATGGT 


AGTGTTACCA 


1401 


TTAGGACATG TGTTTCTTGA CCCTCACTAC TGCCCAT 
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FIGURE 7A A119 SENSE 


NUCLEOTIDE 


1 


ATGCATATCA 


CAAAACCACA 


51 


CCATGTCATC 


CCGGTGATCG 


101 


GCTTCCACGT 


CACCGTCTTC 


151 


TCCAAGTTCC 


TAAACTCAAC 


201 


GGACATTTAT 


GGTTTAGTGG 


251 


GAGTCATTAT 


GCGTGCAGCA 


301 


ATGCATCAAA AGCCAACGGC 


351 


GTTATGTCTC 


GCAAAGGAAT 


401 


CCAACGCACG 


TTTTCTCGGA 


451 


GATATCAAGG 


AAGAGCACAC 


501 


GTGTGAACCG 


GTTAGGTTCG 


551 


ACGAACCGGT 


GTACCGGGAT 


601 


GCCGATGGAA 


TTTTGGTAAA 


651 


GAAGTCCCTT 


CTAAACCCAA 


701 


TCTATCCAAT 


CGGTCCCTTA 


751 


CACCCGGTTT 


TGGATTGGTT 


801 


TATCTCCTTC 


GGGAGTGGTG 


851 


TGGCGTGGGG 


ACTCGAGCAG 


901 


CCACCGGTCG 


ACGGTTCGTG 


951 


TGGAACCGAA 


GACAACACGC 


1001 


GTACTAGTGA 


TAGAGGTTTC 


1051 


ATCCTGTCCC 


ATCGGGCCGT 


1101 


CTCGACGTTG 


GAAAGCGTCG 


1151 


TTTTTGCCGA GCAGAATATG 


1201 


ATCGCAGTCA 


GATTGGATGA 


1251 


TGAGGCGTTG 


GTGAGGAAGG 


1301 


GAAGGAAAGT 


GAAGAAGTTG 


1351 


GACGGTGGTG 


GTTTGGCGCA 


1401 


TCAACGGTTT 


TTGGAACGTG 



SEQUENCE 

CGCCGCCATG TTTTCCAGTC CCGGAATGGG 
AGCTTGGAAA GCGTCTCTCC GCTAACAACG 
GTCCTCGAAA CCGACGCAGC CTCCGCTCAA 
CGGCGTCGAC ATCGTCAAAC TTCCATCGCC 
ACCCCGACGA CCATGTAGTG ACCAAGATCG 
GTTCCAGCCC TCCGATCCAA GATCGCTGCC 
TCTGATCGTT GACTTGTTTG GCACAGATGC 
TTAACATGTT GAGTTATGTG TTTATCCCTA 
GTTTCGATTT ATTATCCAAA TTTGGACAAA 
AGTGCAAAGA AACCCACTCG CTATACCGGG 
AAGATACTCT GGATGCATAT CTGGTTCCCG 
TTTGTTCGTC ATGGTCTGGC TTACCCAAAA 
TACATGGGAA GAGATGGAGC CCAAATCATT 
AGCTCTTGGG CCGGGTTGCT CGTGTACCGG 
TGCAGACCGA TACAATCATC CGAAACCGAT 
AAACGAACAA CCGAACGAGT CGGTTCTCTA 
GTTGTCTATC GGCGAAACAG TTAACTGAAT 
AGCCAGCAAC GGTTCGTATG GGTGGTTCGA 
TTGTAGCGAG TATGTCTCGG CTAACGGTGG 
CAGAGTATCT ACCGGAAGGG TTCGTGAGTC 
GTGGTCCCCT CATGGGCCCC ACAAGCTGAA 
TGGTGGGTTT TTGACCCATT GCGGTTGGAG 
TTGGCGGCGT TCCGATGATC GCATGGCCAC 
AATGCGGCGT TGCTCAGCGA CGAACTGGGA 
TCCAAAGGAG GATATTTCTA GGTGGAAGAT 
TTATGACTGA GAAGGAAGGT GAAGCGATGA 
AGAGACTCGG CGGAGATGTC ACTGAGCATT 
CGAGTCGCTT TGCAGAGTCA CCAAGGAGTG 
TCGTGGACTT GTCACGTGGT GCTTAG 
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FIGURE 7B A119 AMINO ACID SEQUENCE 



1 


MHITKPHAAM 


FSSPGMGHVI 


PVIELGKRLS 


ANNGFHVTVF 


VLETDAASAQ 


51 


SKFLNSTGVD 


IVKLPSPDIY 


GLVDPDDHW 


TKIGVIMRAA 


VPALRSKIAA 


101 


MHQKPTALIV 


DLFGTDALCL 


AKEFNMLSYV 


FIPTNARFLG 


VSIYYPNLDK 


151 


DIKEEHTVQR 


NPLAIPGCEP 


VRFEDTLDAY 


LVPDEPVYRD 


FVRHGLAYPK 


201 


ADGILVNTWE 


EMEPKSLKSL 


LNPKLLGRVA 


RVPVYPIGPL 


CRPIQSSETD 


251 


HPVLDWLNEQ 


PNESVLYISF GSGGCLSAKQ LTELAWGLEQ SQQRFVWVR 


301 


PPVDGSCCSE 


YVSANGGGTE 


DNTPEYLPEG 


FVSRTSDRGF 


WPSWAPQAE 


351 


ILSHRAVGGF 


LTHCGWSSTL 


ESWGGVPMI 


AWPLFAEQNM 


NAALLSDELG 


401 


IAVRLDDPKE 


DISRWKIEAL 


VRKVMTEKEG 


EAMRRKVKKL 


RDSAEMSLSI 


451 


DGGGLAHESL CRVTKECQRF LERWDLSRG A 
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FIGURE 7C All 9. ANT I SENSE NUCLEOTIDE SEQUENCE 



1 


CTAAGCACCA 


CGTGACAAGT 


CCACGACACG 


TTCCAAAAAC 


CGTTGACACT 


51 


CCTTGGTGAC 


TCTGCAAAGC 


GACTCGTGCG 


CCAAACCACC 


ACCGTCAATG 


101 


CTCAGTGACA 


TCTCCGCCGA 


GTCTCTCAAC 


TTCTTCACTT 


TCCTTCTCAT 


. 151 


CGCTTCACCT 


TCCTTCTCAG 


TCATAACCTT 


-CCTCACCAAC 


GCCTCAATCT 


201 


TCCACCTAGA 


AATATCCTCC 


TTTGGATCAT 


CCAATCTGAC 


TGCGATTCCC 


251 


AGTTCGTCGC 


TGAGCAACGC 


CGCATTCATA 


TTCTGCTCGG 


CAAAAAGTGG 


301 


CCATGCGATC 


ATCGGAACGC 


CGCCAACGAC 


GCTTTCCAAC 


GTCGAGCTCC 


351 


AACCGCAATG 


GGTCAAAAAC 


CCACCAACGG 


CCCGATGGGA 


CAGGATTTCA 


401 


GCTTGTGGGG 


CCCATGAGGG 


GACCACGAAA 


CCTCTATCAC 


TAGTACGACT 


451 


CACGAACCCT 


TCCGGTAGAT 


ACTCTGGCGT 


GTTGTCTTCG 


GTTCCACCAC 


501 


CGTTAGCCGA 


GACATACTCG 


CTACAACACG 


AACCGTCGAC 


CGGTGGTCGA 


551 


ACCACCCATA 


CGAACCGTTG 


CTGGCTC'TGC 


TCGAGTCCCC 


ACGCCAATTC 


601 


AGTTAACTGT 


TTCGCCGATA 


GACAACCACC 


ACTCCCGAAG 


GAGATATAGA 


651 


GAACCGACTC 


GTTCGGTTGT 


TCGTTTAACC 


AATCCAAAAC 


CGGGTGATCG 


701 


GTTTCGGATG 


ATTGTATCGG 


TCTGCATAAG 


GGACCGATTG 


GATAGACCGG 


751 


TACACGAGCA ACCCGGCCCA AGAGCTTTGG GTTTAGAAGG GACTTCAATG 


801 


ATTTGGGCTC 


CATCTCTTCC 


CATGTATTTA 


CCAAAATTCC 


ATCGGCTTTT 


851 


GGGTAAGCCA 


GACCATGACG 


AACAAAATCC 


CGGTACACCG 


GTTCGTCGGG 


901 


AACCAGATAT 


GCATCCAGAG 


TATCTTCGAA 


CCTAACCGGT 


TCACACCCCG 


951 


GTATAGCGAG 


TGGGTTTCTT 


TGCACTGTGT 


GCTCTTCCTT 


GATATCTTTG 


1001 


TCCAAATTTG 


GATAATAAAT 


CGAAACTCCG 


AGAAAACGTG 


CGTTGGTAGG 


1051 


GATAAACACA 


TAACTCAACA 


TGTTAAATTC 


CTTTGCGAGA 


CATAACGCAT 


1101 


CTGTGCCAAA 


CAAGTCAACG 


ATCAGAGCCG 


TTGGCTTTTG 


ATGCATGGCA 


1151 


GCGATCTTGG 


ATCGGAGGGC 


TGGAACTGCT 


GCACGCATAA 


TGACTCCGAT 


1201 


CTTGGTCACT 


ACATGGTCGT 


CGGGGTCCAC 


TAAACCATAA 


ATGTCCGGCG 


1251 


ATGGAAGTTT 


GACGATGTCG 


ACGCCGGTTG 


AGTTTAGGAA 


CTTGGATTGA 


1301 


GCGGAGGCTG 


CGTCGGTTTC 


GAGGACGAAG 


ACGGTGACGT 


GGAAGCCGTT 


1351 


GTTAGCGGAG. AGACGCTTTC 


CAAGCTCGAT 


CACCGGGATG 


ACATGGCCCA 


1401 


TTCCGGGACT 


GGAAAACATG 


GCGGCGTGTG 


GTTTTGTGAT 


ATGCAT 
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FIGURE 8A A233 SENSE 


NUCLEOTIDE 


1 


ATGAGTAGTG 


ATCCTCATCG 


51 


GGCTTATGGT 


CACATGATAC 


101 


GCAGAGGAGC 


CAAATCTACA 


151 


TTCCAAAAAC CCATCGAAAG 


201 


CGACATCCAG 


ATCTTCGATT 


251 


GATGCGAAAA 


CGTCGATTTC 


301 


TATCTGACCT 


TGAAGTTCTT 


351 


TGAGAAGCTC 


CTCGAGACAA 


401 


TCTTCCCCTG 


GGCTACGGAA 


451 


GTGTTCCACG 


GTACTGGCTA 


501 


AGTGCATAAC 


CCACAAAACA 


551 


TTCCTGATCT 


CCCGGGGAAC 


601 


CGTGACGAAG 


AAAGCGAGAT 


651 


TGATGTGAAG 


AGCTCAGGTG 


701 


CTGATTACGC 


CGACTTTTAC 


751 


ATCGGTCCGC 


TTTCGGTTTA 


801 . 


AGGAAAGAAA 


GCAAGCATTA 


851 


CCAAGAAACC 


AGATTCAGTC 


901 


TTCAAGAACG 


AGCAGCTATT 


951 


AGCAAATTTC 


ATCTGGGTTG 


1001 


AATGGTTACC 


AGAAGGGTTC 


1051 


ATAAGAGGAT 


GGGCACCACA 


1101 


TGGGTTTGTG 


ACCCATTGCG 


1151 


CAGGGCTACC 


AATGGTGACA 


1201 


GAGAAATTGG 


TTACGCAAGT 


1251 


AAAGAATGTA AGAACTACGG 


1301 


AAGCGGTGAG 


GGAGGTGTTG 


1351 


AGGGCAAAGA. AGTTGGCAGA 


1401 


TTCTTTCAAC 


GATCTAAACA 



SEQUENCE 

TAAGCTCCAT GTTGTGTTCT TCCCTTTCAT 
CAACTCTAGA CATGGCTAAG CTTTTCTCTA 
ATCCTCACCA CACCTCTCAA CTCCAAGATC 
ATTCAAGAAC CTGAATCCGA GTTTCGAAAT 
TCCCTTGCGT GGATCTCGGG TTACCAGAAG 
TTCACCTCAA ACAACAATGA TGATAGACAG 
TAAGTCGACA AGGTTTTTCA AAGATCAGCT 
CGAGACCAGA CTGTCTTATC GCCGACATGT 
GCTGCTGAGA AGTTCAATGT GCCAAGACTT 
CTTTTCTTTA TGCTCTGAAT AT TG CATC AG 
TAGTAGCTTC AAGGTACGAG CCATTTGTGA 
ATAGTGATAA CTCAAGAACA GATAGCAGAC 
GGGGAAGTTT ATGATTGAGG TCAAAGAATC 
TTATTGTAAA CAGCTTCTAC GAGCTTGAAC 
AAGAGTGTTG TACTGAAGAG AGCGTGGCAT 
CAACAGAGGA TTTGAGGAGA AGGCTGAGAG 
ATGAGGTTGA ATGCCTCAAA TGGCTTGACT 
ATTTACATTT CTTTTGGGAG CGTGGCTTGC 
CGAGATCGCT GCAGGATTAG AAACTTCTGG 
TTAGGAAAAA CATAGGTATT GAAAAAGAAG 
GAAGAGAGGG TGAAAGGAAA AGGGATGATT 
GGTGCTCATA CTTGATCATC AAGCAACTTG 
GCTGGAACTC GCTTCTGGAA GGAGTGGCTG 
TGGCCTGTAG CAGCGGAGCA ATTCTACAAT 
GCTCAGAACA GGAGTGAGCG TGGGAGCGAA 
GAGATTTCAT TAGCAGAGAG AAAGTGGTTA 
GTTGGGGAAG AGGCGGATGA GAGGCGGGAG 
GATGGCTAAA GCTGCCGTGG AAGGAGGGTC 
GCTTCATAGA AGAGTTTACC TCGTAA 



2a /C<\ 
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FIGURE 8B A233 AMINO ACID SEQUENCE 

1 MSSDPHRKLH VVFFPFMAYG HMIPTLDMAK LFSSRGAKST ILTTPLNSKI 

51 FQKPIERFKN LNPSFEIDIQ IFDFPCVDLG LPEGCENVDF FTSNNNDDRQ 

101 YLTLKFFKST RFFKDQLEKL LETTRPDCLI ADMFFPWATE AAEKFNVPRL 

:i51 VFHGTGYFSL CSEYCIRVHN PQNIVASRYE PFVIPDLPGN IVITQEQIAD 

201 RDEESEMGKF MIEVKESDVK SSGVIVNSFY ELEPDYADFY KSWLKRAWH 

251 IGPLSVYNRG FEEKAERGKK ASINEVECLK WLDSKKPDSV IYISFGSVAC 

301 FKNEQLFEIA AGLETSGANF IWVVRKNIGI EKEEWLPEGF EERVKGKGMI 

351 IRGWAPQVLI LDHQATCGFV THCGWNSLLE GVAAGLPMVT WPVAAEQFYN 

401 EKLVTQVLRT GVSVGAKKNV RTTGDFISRE KWKAVREVL VGEEADERRE 

451 RAKKLAEMAK AAVEGGSSFN DLNSFIEEFT S 
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FIGURE SC A233 ANTISENSE NUCLEOTIDE SEQUENCE 



1 


TTACGAGGTA 


AACTCTTCTA 


TGAAGCTGTT 


TAGATCGTTG 


AAAGAAGACC 


51 


CTCCTTCCAC 


GGCAGCTTTA 


GCCATCTCTG 


CCAACTTCTT 


TGCCCTCTCC 


101 


CGCCTCTCAT 


CCGCCTCTTC 


CCCAACCAAC 


ACCTCCCTCA 


CCGCTTTAAC 


151 


CACTTTCTCT 


CTGCTAATGA 


AATCTCCCGT 


AGTTCTTACA 


TTCTTTTTCG 


201 


CTCCCACGCT 


CACTCCTGTT 


CTGAGCACTT 


GCGTAACCAA 


TTTCTCATTG 


251 


TAGAATTGCT 


CCGCTGCTAC 


AGGCCATGTC 


ACCATTGGTA 


GCCCTGCAGC 


301 


CACTCCTTCC 


AGAAGCGAGT 


TCCAGCCGCA 


ATGGGTCACA 


AACCCACAAG 


351 


TTGCTTGATG 


ATCAAGTATG 


AGCACCTGTG 


GTGCCCATCC 


TCTTATAATC 


401 


ATCCCTTTTC 


CTTTCACCCT 


CTCTTCGAAC 


CCTTCTGGTA ACCATTCTTC 


451 


TTTTTCAATA 


CCTATGTTTT 


TCCTAACAAC 


CCAGATGAAA 


TTTGCTCCAG 


501 


AAGTTTCTAA 


TCCTGCAGCG ATCTCGAATA 


GCTGCTCGTT 


CTTGAAGCAA 


551 


GCCACGCTCC 


CAAAAGAAAT 


GTAAATGACT 


GAATCTGGTT 


TCTTGGAGTC 


601 


AAGCCATTTG 


AGGCATTCAA 


CCTCATTAAT 


GCTTGCTTTC 


TTTCCTCTCT 


651 


CAGCCTTCTC 


CTCAAATCCT 


CTGTTGTAAA 


CCGAAAGCGG 


ACCGATATGC 


701 


CACGCTCTCT 


TCAGTACAAC 


ACTCTTGTAA 


AAGTCGGCGT 


AATCAGGTTC 


751 


AAGCTCGTAG 


AAGCTGTTTA 


CAATAACACC 


TGAGCf CTTC ACATCAGATT 


801 


CTTTGACCTC 


AATCATAAAC 


TTCCCCATCT 


CGCTTTCTTC 


GTCACGGTCT 


851 


GCTATCTGTT 


CTTGAGTTAT 


CACTATGTTC 


CCCGGGAGAT 


CAGGAATCAC 


901 


AAATGGCTCG 


TACCTTGAAG 


CTACTATGTT 


TTGTGGGTTA 


TGCACTCTGA 


951 


TGCAATATTC 


AGAGCATAAA 


GAAAAGTAGC 


CAGTACCGTG 


GAACACAAGT 


1001 


CTTGGCACAT 


TGAACTTCTC 


AGCAGCTTCC 


GTAGCCCAGG 


GGAAGAACAT 


1051 


GTCGGCGATA 


AGACAGTCTG 


GTCTCGTTGT 


CTCGAGGAGC 


TTCTCAAGCT 


1101 


GATCTTTGAA 


AAACCTTGTC 


GACTTAAAGA 


ACTTCAAGGT 


CAGATACTGT 


1151 


CTATCATCAT 


TGTTGTTTGA GGTGAAGAAA 


TCGACGTTTT 


CGCATCCTTC 


1201 


TGGTAACCCG AGATCCACGC AAGGGAAATC 


GAAGATCTGG 


ATGTCGATTT 


1251 


CGAAACTCGG 


ATTCAGGTTC 


TTGAATCTTT 


CGATGGGTTT 


TTGGAAGATC 


1301 


TTGGAGTTGA 


GAGGTGTGGT 


GAGGATTGTA 


GATTTGGCTC 


CTCTGCTAGA 


1351 


GAAAAGCTTA 


GCCATGTCTA 


GAGTTGGTAT 


CATGTGACCA 


TAAGCCATGA 


1401 


AAGGGAAGAA 


CACAACATGG 


AGCTTACGAT 


GAGGATCACT 


ACTCAT 
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FIGURE 9A A407 SENSE 


NUCLEOTIDE 


1 


ATGCATATCA 


CAAAACCACA 


51 


CCATGTCCTC 


CCGGTGATCG 


101 


GCTTCCACGT 


CACCGTCTTC 


. 151 


TCCAAGCTCC 


TTAACTCAAC 


201 


CGACATTTCT 


GGCTTGGTAG 


251 


GAGTCATTAT 


GCGTGAAGCT 


301 


ATGCATCAAA 


ACCCAACGGC 


351 


GTTATGTCTT 


GCAGCGGAGT 


401 


CCAACGCGCG 


TTATCTCGGA 


451 


GTTATCAAAG 


AAGAGCACAC 


501 


GTGTGAACCG 


GTTAGATTTG 


551 


ACGAACCGGT 


GTACCACGAT 


601 


GCGGATGGAA 


TCTTGGTGAA 


651 


AAAGTCCCTT 


CAAGACCCGA 


701 


TTTATCCGGT 


TGGTCCGTTA 


751 


CACCCGGTTT 


TTGATTGGTT 


801 


CATTTCCTTC 


GGGAGTGGTG 


851 


TGGCGTGGGG 


GCTCGAGGAG 


901 


CCGCCCGTTG 


ACGGCTCGTC 


951 


TGTAACCAAA 


GACAACACGC 


1001 


GTACTTGCGA 


TAGAGGTTTC 


1051 


ATCCTAGCCC 


ATCAGGCCGT 


1101 


CTCGACGTTG 


GAAAGCGTCC 


1151 


TTTTCGCCGA GCAGAATATG 


1201 


ATCTCTGTTA 


GAGTGGATGA 


1251 


TGAGGCGATG 


GTGAGGAAGG 


1301 


GAAGGAAAGT 


GAAGAAGTTG 


1351 


CACGGTGGTG 


GTTCGGCGCA 


1401 


TCAACGGTTT 


TTGGAATGTG 



SEQUENCE 

CGCCGCCATG TTTTCCAGTC CCGGAATGGG 
AGCTAGCTAA GCGTCTCTCC GCTAACCACG 
GTCCTTGAAA CTGACGCAGC CTCCGTTCAG 
CGGTGTTGAC ATCGTCAACC~TTCCATCGCC 
ACCCCAACGC CCATGTGGTG ACCAAGATCG 
GTTCCAACCC TCCGATCCAA GATCGTTGCC 
TCTGATCATT GACTTGTTTG GCACAGATGC 
TAAACATGTT GACTTATGTC TTTATCGCTT 
GTTTCGATAT ATTATCCAAC TTTGGACGAA 
AGTGCAACGA AAACCGCTCA CTATACCGGG 
AAGATATTAT GGATGCATAT CTGGTTCCGG 
TTGGTTCGTC ACTGTCTGGC CTACCCAAAA 
TACATGGGAA GAGATGGAGC CCAAATCATT 
AACTTTTGGG CCGGGTCGCT CGTGTACCGG 
TGCAGACCGA TACAATCATC CACGACCGAT 
AAACAAACAA CCAAACGAGT CGGTTCTCTA 
GTTCTCTAAC GGCTCAACAG TTAACCGAAT 
AGCCAGCAAC GGTTTATATG GGTGGTTCGA 
TTGCAGTGAT TATTTCTCGG CTAAAGGCGG 
CAGAGTATCT ACCAGAAGGG TTCGTGACTC 
ATGATCCCAT CATGGGCACC GCAAGCTGAA 
TGGTGGGTTT TTAACACATT GTGGTTGGAG 
TTTGCGGCGT TCCAATGATA GCGTGGCCGC 
AACGCGGCGT TGCTTAGCGA TGAACTGGGA 
TCCAAAGGAG GCGATTTCTA GGTCGAAGAT 
TTATGGCTGA GGACGAAGGT GAAGAGATGA 
AGAGACACGG CGGAGATGTC ACTTAGTATT 
TGAGTCGCTT TGCAGAGTCA CGAAGGAGTG 
TCGGGGACTT GGGACGTGGT GCTTAG 
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FIGURE 9B A407 AMINO ACID SEQUENCE 

1 MHITKPHAAM FSSPGMGHVL PVIELAKRLS ANHGFHVTVF VLETDAASVQ 

51 SKLLNSTGVD IVNLPSPDIS GLVDPNAHW TKIGVIMREA VPTLRSKIVA 

101 MHQNPTALII DLFGTDALCL AAELNMLTYV FIASNARYLG VSIYYPTLDE 

151 VIKEEHTVQR KPLTIPGCEP VRFEDIMDAY LVPDEPVYHD LVRHCLAYPK 

201 ADGILVNTWE EMEPKSLKSL QDPKLLGRVA RVPVYPVGPL CRPIQSSTTD 

251 HPVFDWLNKQ PNESVLYISF GSGGSLTAQQ LTELAWGLEE SQQRFIWWR 

301 PPVDGSSCSD YFSAKGGVTK DNTPEYLPEG FVTRTCDRGF MIPSWAPQAE 

351 ILAHQAVGGF LTHCGWSSTL ESVLCGVPMI AWPLFAEQNM NAALLSDELG 

401 ISVRVDDPKE AISRSKIEAM VRKVMAEDEG EEMRRKVKKL RDTAEMSLSI 

4 51 HGGGSAHESL CRVTKECQRF LECVGDLGRG A 
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FIGURE 9C A407 ANT I SENSE NUCLEOTIDE SEQUENCE 

1 CTAAGCACCA CGTCCCAAGT CCCCGACACA TTCCAAAAAC CGTTGACACT 
51 CCTTCGTGAC TCTGCAAAGC GACTCATGCG CCGAACCACC ACCGTGAATA 
101 CTAAGTGACA TCTCCGCCGT GTCTCTCAAC TTCTTCACTT TCCTTCTCAT 
151 CTCTTCACCT TCGTCCTCAG CCATAACCTT CCTCACCATC GCCTCAATCT 
201 TCGACCTAGA AATCGCCTCC TTTGGATCAT CCACTCTAAC AGAGATTCCC 
251 AGTTCATCGC TAAGCAACGC CGCGTTCATA TTCTGCTCGG CGAAAAGCGG 
301 CCACGCTATC ATTGGAACGC CGCAAAGGAC GCTTTCCAAC GTCGAGCTCC 
351 AACCACAATG TGTTAAAAAC CCACCAACGG CCTGATGGGC TAGGATTTCA 
4 01 GCTTGCGGTG CCCATGATGG GATCATGAAA CCTCTATCGC AAGTACGAGT 
4 51 CACGAACCCT TCTGGTAGAT ACTCTGGCGT GTTGTCTTTG GTTACACCGC 
501 CTTTAGCCGA GAAATAATCA CTGCAAGACG AGCCGTCAAC GGGCGGTCGA 
551 ACCACCCATA TAAACCGTTG CTGGCTCTCC TCGAGCCCCC ACGCCAATTC 
601 GGTTAACTGT TGAGCCGTTA GAGAACCACC ACTCCCGAAG GAAATGTAGA 
651 GAACCGACTC GTTTGGTTGT TTGTTTAACC AATCAAAAAC CGGGTGATCG 
7 01 GTCGTGGATG ATTGTATCGG TCTGCATAAC GGACCAACCG GATAAACCGG 
751 TACACGAGCG ACCCGGCCCA AAAGTTTCGG GTCTTGAAGG GACTTTAATG 
801 ATTTGGGCTC CATCTCTTCC CATGTATTCA CCAAGATTCC ATCCGCTTTT 
851 GGGTAGGCCA GACAGTGACG AACCAAATCG TGGTACACCG GTTCGTCCGG 
901 AACCAGATAT GCATCCATAA TATCTTCAAA TCTAACCGGT TCACACCCCG 
951 GTATAGTGAG CGGTTTTCGT TGCACTGTGT GCTCTTCTTT GATAACTTCG 
1001 TCCAAAGTTG GATAATATAT CGAAACTCCG AGATAACGCG CGTTGGAAGC 
1051 GATAAAGACA TAAGTCAACA TGTTTAACTC CGCTGCAAGA CATAACGCAT 
1101 CTGTGCCAAA CAAGTCAATG ATCAGAGCCG TTGGGTTTTG ATGCATGGCA 
1151 ACGATCTTGG ATCGGAGGGT TGGAACAGCT TCACGCATAA TGACTCCGAT 
1201 CTTGGTCACC ACATGGGCGT TGGGGTCTAC CAAGCCAGAA ATGTCGGGCG 
1251 ATGGAAGGTT GACGATGTCA ACACCGGTTG AGTTAAGGAG CTTGGACTGA 
1301 ACGGAGGCTG CGTCAGTTTC AAGGACGAAG ACGGTGACGT GGAAGCCGTG 
1351 GTTAGCGGAG AGACGCTTAG CTAGCTCGAT CACCGGGAGG ACATGGCCCA 
14 01 TTCCGGGACT GGAAAACATG GCGGCGTGTG GTTTTGTGAT ATGCAT 
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FIGURE 10A A961 SENSE NUCLEOTIDE SEQUENCE 

1 ATGGGGAAGC AAGAAGATGC AGAGCTCGTC ATCATACCTT TCCCTTTCTC 
51 CGGACACATT CTCGCAACAA TCGAACTCGC CAAACGTCTC ATAAGTCAAG 
101 ACAATCCTCG GATCCACACC ATCACCATCC TCTATTGGGG ATTACCTTTT 
151 ATTCCTCAAG CTGACACAAT CGCTTTCCTC CGATCCCTAG TCAAAAATGA 
201 GCCTCGTATC CGTCTCGTTA CGTTGCCCGA AGTCCAAGAC CCTCCACCAA 
251 TGGAACTCTT TGTGGAATTT GCCGAATCTT ACATTCTTGA ATACGTCAAG 
301 AAAATGGTTC CCATCATCAG AGAAGCTCTC TCCACTCTCT TGTCTTCCCG 
351 CGATGAATCG GGTTCAGTTC GTGTGGCTGG ATTGGTTCTT GACTTCTTCT 
401 GCGTCCCTAT GATCGATGTA GGAAACGAGT TTAATCTCCC TTCTTACATT 
451 TTCTTGACGT GTAGCGCAGG GTTCTTGGGT ATGATGAAGT ATCTTCCAGA 
501 GAGACACCGC GAAATCAAAT CGGAATTCAA CCGGAGCTTC AACGAGGAGT 
551 TGAATCTCAT TCCTGGTTAT GTCAACTCTG TTCCTACTAA GGTTTTGCCG 
601 TCAGGTCTAT TCATGAAAGA GACCTACGAG CCTTGGGTCG AACTAGCAGA 
651 GAGGTTTCCT GAAGCTAAGG GTATTTTGGT TAATTCATAC ACAGCTCTCG 
701 AGCCAAACGG TTTTAAATAT TTCGATCGTT GTCCGGATAA CTACCCAACC 
751 ATTTACCCAA TCGGGCCGAT ATTATGCTCC AACGACCGTC CGAATTTGGA 
801 CTCATCGGAA CGAGATCGGA TCATAACTTG GCTAGATGAC CAACCCGAGT 
851 CATCGGTCGT GTTCCTCTGT TTCGGGAGCT TGAAGAATCT CAGCGCTACT 
901 CAGATCAACG AGATAGCTCA AGCCTTAGAG ATCGTTGACT GCAAATTCAT 
951 CTGGTCGTTT CGAACCAACC CGAAGGAGTA CGCGAGCCCT TACGAGGCTC 
1001 TACCACACGG GTTCATGGAC CGGGTCATGG ATCAAGGCAT TGTTTGTGGT 
1051 TGGGCTCCTC AAGTTGAAAT CCTAGCCCAT AAAGCTGTGG GAGGATTCGT 
1101 ATCTCATTGT GGTTGGAACT CGATATTGGA GAGTTTGGGT TTCGGCGTTC 
1151 CAATCGCCAC GTGGCCGATG TACGCGGAAC AACAACTAAA CGCGTTCACG 
1201 ATGGTGAAGG AGCTTGGTTT AGCCTTGGAG ATGCGGTTGG ATTACGTGTC 
1251 GGAAGATGGA GATATAGTGA AAGCTGATGA GATCGCAGGA ACCGTTAGAT 
1301 CTTTAATGGA CGGTGTGGAT GTGCCGAAGA GTAAAGTGAA GGAGATTGCT 
1351 GAGGCGGGAA AAGAAGCTGT GGACGGTGGA TCTTCGTTTC TTGCGGTTAA 
1401 AAGATTCATC GGTGACTTGA TCGACGGCGT TTCTATAAGT AAGTAG 
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FIGURE 10B A961 AMINO ACID SEQUENCE 



1 


MGKQEDAELV 


IIPFPFSGHI 


LATIELAKRL 


ISQDNPRIHT ITILYWGLPF 


■ 51 


IPQADTIAFL 


RSLVKNEPRI 


RLVTLPEVQD 


PPPMELFVEF AESYILEYVK 


101 


KMVPIIREAL 


STLLSSRDES 


GSVRVAGLVL 


DFFCVPMIDV GNEFNLPSYI 


151 


FLTCSAGFLG 


MMKYLPERHR 


EIKSEFNR3F 


NEELNLIPGY VNSVPTKVLP 


201 


SGLFMKETYE 


PWVELAERFP 


EAKGILVNSY 


TALE PNG FKY FDRCPDNYPT 


251 


IYPIGPILCS 


NDRPNLDSSE 


RDRIITWLDD 


QPESSWFLC FGSLKNLSAT 


301 


QINEIAQALE 


IVDCKFIWSF 


RTNPKEYASP 


YEALPHGFMD RVMDQGIVCG 


351 


WAPQVEILAH 


KAVGGFVSHC 


GWNSILESLG 


FGVPIATWPM YAEQQLNAFT 


401 


MVKELGLALE MRLDYVSEDG 


DIVKADEIAG 


TVRSLMDGVD VPKSKVKEIA 


451 


EAGKEAVDGG 


SSFLAVKRFI 


GDLIDGVSIS 


K 



2 
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FIGURE 10C A961 ANTI SENSE NUCLEOTIDE SEQUENCE 

1 CTACTTACTT ATAGAAACGC CGTCGATCAA GTCACCGATG AATCTTTTAA 

51 CCGCAAGAAA CGAAGATCCA CCGTCCACAG CTTCTTTTCC CGCCTCAGCA 

101 ATCTCCTTCA CTTTACTCTT CGGCACATCC ACACCGTCCA TTAAAGATCT 

151 AACGGTTCCT GCGATCTCAT CAGCTTTCAC TATATCTCCA TCTTCCGACA 

201 CGTAATCCAA CCGCATCTCC AAGGCTAAAC CAAGCTCCTT CACCATCGTG 

251 AACGCGTTTA GTTGTTGTTC CGCGTACATC GGCCACGTGG CGATTGGAAC 

301 GCCGAAACCC AAACTCTCCA ATATCGAGTT CCAACCACAA TGAGATACGA 

351 ATCCTCCCAC AGCTTTATGG GCTAGGATTT CAACTTGAGG AGCCCAACCA 

401 CAAACAATGC CTTGATCCAT GACCCGGTCC ATGAACCCGT GTGGTAGAGC 

4 51 CTCGTAAGGG CTCGCGTACT CCTTCGGGTT GGTTCGAAAC GACCAGATGA 

501 ATTTGCAGTC AACGATCTCT AAGGCTTGAG CTATCTCGTT GATCTGAGTA 

551 GCGCTGAGAT TCTTCAAGCT CCCGAAACAG AGGAACACGA CCGATGACTC 

601 GGGTTGGTCA TCTAGCCAAG TTATGjATCCG ATCTCGTTCC GATGAGTCCA 

651 AATTCGGACG GTCGTTGGAG CATAATATCG GCCCGATTGG GTAAATGGTT 

701 GGGTAGTTAT CCGGACAACG ATCGAAATAT TTAAAACCGT TTGGCTCGAG 

7 51 AGCTGTGTAT GAATTAACCA AAATACCCTT AGCTTCAGGA AACCTCTCTG 

801 CTAGTTCGAC CCAAGGCTCG TAGGTCTCTT TCATGAATAG ACCTGACGGC 

851 AAAACCTTAG TAGGAACAGA GTTGACATAA CCAGGAATGA GATTCAACTC 

901 CTCGTTGAAG CTCCGGTTGA ATTCCGATTT GATTTCGCGG TGTCTCTCTG 

951 GAAGATACTT CATCATACCC AAGAACCCTG CGCTACACGT CAAGAAAATG 

1001 TAAGAAGGGA GATTAAACTC GTTTCCTACA TCGATCATAG GGACGCAGAA 

1051 GAAGTCAAGA ACCAATCCAG CCACACGAAC TGAACCCGAT TCATCGCGGG 

1101 AAGACAAGAG AGTGGAGAGA GCTTCTCTGA TGATGGGAAC CATTTTCTTG 

1151 ACGTATTCAA GAATGTAAGA TTCGGCAAAT TCCACAAAGA GTTCCATTGG 

1201 TGGAGGGTCT TGGACTTCGG GCAACGTAAC GAGACGGATA CGAGGCTCAT 

1251 TTTTGACTAG GGATCGGAGG AAAGCGATTG TGTCAGCTTG AGGAATAAAA 

1301 GGTAATCCCC AATAGAGGAT GGTGATGGTG TGGATCCGAG GATTGTCTTG 

1351 ACTTATGAGA CGTTTGGCGA GTTCGATTGT TGCGAGAATG TGTCCGGAGA 

1401 AAGGGAAAGG TATGATGACG AGCTCTGCAT CTTCTTGCTT CCCCAT 
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FIGURE 11A A962 SENSE NUCLEOTIDE SEQUENCE 



1 


ATGGCGAAGC AGCAAGAAGC AGAGCTCATC 


TTCATCCCAT 


TTCCAATCCC 


51 


CGGACACATT CTCGCCACAA TCGAACTCGC 


GAAACGTCTC 


ATCAGTCACC 


101 


AACCTAGTCG GATCCACACC ATCACCATCC 


TCCATTGGAG 


CTTACCTTTT 


. 151 


CTTCCTCAAT CTGAGACTAT CGCCTTCCTC 


AAATCCCTAA 


TCGAAACAGA 


201 


GTCTCGTATC CGTCTCATTA CCTTACCCGA 


TGTCCAAAAC 


CCTCCACCAA 


251 


TGGAGCTATT TGTGAAAGCT TCCGAATCTT 


ACATTCTTGA 


ATACGTCAAG 


301 


AAAATGGTTC CTTTGGTCAG AAACGCTCTC 


TCCACTCTCT 


TGTCTTCTCG 


351 


TGATGAATCG GATTCAGTTC ATGTCGCCGG 


ATTAGTTCTT 


GATTTCTTCT 


401 


GTGTCCCTTT GATCGATGTC GGAAACGAGT 


TTAATCTCCC 


TTCTTACATC 


451 


TTCTTGACGT GTAGCGCAAG TTTCTTGGGT 


ATGATGAAGT 


ATCTTCTGGA 


501 


GAGAAACCGC GAAACCAAAC CGGAACTTAA 


CCGGAGCTCT 


GACGAGGAAA 


551 


CAATATCAGT TCCTGGTTTT GTTAACTCCG 


TTCCGGTTAA 


AGTTTTGCCA 


601 


CCGGGTTTGT TCACGACTGA GTCTTACGAA 


GCTTGGGTCG 


AAATGGCGGA 


651 


AAGGTTCCCT GAAGCCAAGG GTATTTTGGT 


CAATTCATTT 


GAATCTCTAG 


701 


AACGTAACGC TTTTGATTAT TTCGATCGTC GTCCGGATAA TTACCCACCC 


751 


GTTTACCCAA TCGGGCCAAT TCTATGCTCC 


AACGATCGTC 


CGAATTTGGA 


801 


TTTATCGGAA CGAGACCGGA TCTTGAAATG GCTCGATGAC 


CAACCCGAGT 


851 


CATCTGTTGT GTTTCTCTGC TTCGGGAGCT 


TGAAGAGTCT 


CGCTGCGTCT 


901 


CAGATTAAAG AGATCGCTCA AGCCTTAGAG 


CTCGTCGGAA 


TCAGATTCCT 


951 


CTGGTCGATT CGAACGGACC CGAAGGAGTA 


CGCGAGCCCG 


AACGAGATTT 


1001 


TACCGGACGG GTTTATGAAC CGAGTCATGG 


GTTTGGGCCT 


TGTTTGTGGT 


1051 


TGGGCTCCTC AAGTTGAAAT TCTGGCCCAT 


AAAGCAATTG 


GAGGGTTCGT 


1101 


GTCACACTGC GGTTGGAACT CGATATTGGA 


GAGTTTGCGT 


TTCGGAGTTC 


1151 


CAATTGCCAC GTGGCCAATG TACGCGGAAC 


AACAACTAAA CGCGTTCACG 


1201 


ATTGTGAAGG AGCTTGGTTT GGCGTTGGAG 


ATGCGGTTGG 


ATTACGTGTC 


1251 


GGAATATGGA GAAATCGTGA AAGCTGATGA AATCGCAGGA 


GCCGTACGAT 


1301 


CTTTGATGGA CGGTGAGGAT GTGCCGAGGA GGAAACTGAA GGAGATTGCG 


1351 


GAGGCGGGAA AAGAGGCTGT GATGGACGGT 


GGATCTTCGT 


TTGTTGCGGT 


1401 


TAAAAGATTC ATAGATGGGC TTTGA 
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FIGURE 11B A962 AMINO ACID SEQUENCE 

1 MAKQQEAELI FIPFPIPGHI LATIELAKRL ISHQPSRIHT ITILHWSLPF 

51 LPQSDTIAFL KSLIETESRI RLITLPDVQN PPPMELFVKA SESYILEYVK 

101 KMVPLVRNAL STLLSSRDES DSVHVAGLVL DFFCVPLIDV GNEFNLPSYI 

151 FLTCSASFLG MMKYLLERNR ETKPELNRSS DEETISVPGF VNSVPVKVLP 

201 PGLFTTESYE AWVEMAERFP EAKGILVNSF ESLERNAFDY FDRRPDNYPP 

251 VYPIGPILCS NDRPNLDLSE RDRILKWLDD QPESSWFLC FGSLKSLAAS 

301 QIKEIAQALE LVGIRFLWSI RTDPKEYASP NEILPDGFMN RVMGLGLVCG 

351 WAPQVEILAH KAIGGFVSHC GWNSILESLR FGVPIATWPM YAEQQLNAFT 

401 IVKELGLALE MRLDYVSEYG EIVKADEIAG AVRSLMDGED VPRRKLKEIA 

4 51 EAGKEAVMDG GSSFVAVKRF IDGL 



32. /r ? 
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FIGURE 11C A962 ANTISENSE NUCLEOTIDE SEQUENCE 

1 TCAAAGCCCA TCTATGAATC TTTTAACCGC AACAAACGAA GATCCACCGT 
51 CCATCACAGC CTCTTTTCCC GCCTCCGCAA TCTCCTTCAG TTTCCTCCTC 
101 GGCACATCCT CACCGTCCAT CAAAGATCGT ACGGCTCCTG CGATTTCATC 
151 AGCTTTCACG ATTTCTCCAT ATTCCGACAC GTAATCCAAC CGCATCTCCA 
201 ACGCCAAACC AAGCTCCTTC ACAATCGTGA ACGCGTTTAG TTGTTGTTCC 
251 GCGTACATTG GCCACGTGGC AATTGGAACT CCGAAACGCA AACTCTCCAA 
301 TATCGAGTTC CAACCGCAGT GTGACACGAA CCCTCCAATT GCTTTATGGG 
351 CCAGAATTTC AACTTGAGGA GCCCAACCAC AAACAAGGCC CAAACCCATG 
401 ACTCGGTTCA TAAACCCGTC CGGTAAAATC TCGTTCGGGC TCGCGTACTC 
451 CTTCGGGTCC GTTCGAATCG ACCAGAGGAA TCTGATTCCG ACGAGCTCTA 
501 AGGCTTGAGC GATCTCTTTA ATCTGAGACG CAGCGAGACT CTTCAAGCTC 
551 CCGAAGCAGA GAAACACAAC AGATGACTCG GGTTGGTCAT CGAGCCATTT 
601 CAAGATCCGG TCTCGTTCCG ATAAATCCAA ATTCGGACGA TCGTTGGAGC 
651 ATAGAATTGG CCCGATTGGG TAAACGGGTG GGTAATTATC CGGACGACGA 
701 TCGAAATAAT CAAAAGCGTT ACGTTCTAGA GATTCAAATG AATTGACCAA 
751 AATACCCTTG GCTTCAGGGA ACCTTTCCGC CATTTCGACC CAAGCTTCGT 
801 AAGACTCAGT CGTGAACAAA CCCGGTGGCA AAACTTTAAC CGGAACGGAG 
851 TTAACAAAAC CAGGAACTGA TATTGTTTCC TCGTCAGAGC TCCGGTTAAG 
.901 TTCCGGTTTG GTTTCGCGGT TTCTCTCCAG AAGATACTTC ATCATACCCA 
951 AGAAACTTGC GCTACACGTC AAGAAGATGT AAGAAGGGAG ATTAAACTCG 
1001 TTTCCGACAT CGATCAAAGG GACACAGAAG AAATCAAGAA CTAATCCGGC 
1051 GACATGAACT GAATCCGATT CATCACGAGA AGACAAGAGA GTGGAGAGAG 
1101 CGTTTCTGAC CAAAGGAACC ATTTTCTTGA CGTATTCAAG AATGTAAGAT 
1151 TCGGAAGCTT TCACAAATAG CTCCATTGGT GGAGGGTTTT GGACATCGGG 
1201 TAAGGTAATG AGACGGATAC GAGACTCTGT TTCGATTAGG CATTTGAGGA 
1251 AGGCGATAGT GTCAGATTGA GGAAGAAAAG GTAAGCTCCA ATGGAGGATG. 
1301 GTGATGGTGT GGATCCGACT AGGTTGGTGA CTGATGAGAC GTTTCGCGAG 
1351 TTCGATTGTG GCGAGAATGT GTCCGGGGAT TGGAAATGGG ATGAAGATGA 
1401 GCTCTGCTTC TTGCTGCTTC GCCAT 
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UGT71B5 Figure 12 

ATGAAGATTGAGCTTGTGTTCATACCTTTGCCGGGGATTGGTCATCTCAGGCCAACCGTGAAGCTAGCG 
AAGCAACTCATAGGCAGCGAAAACCGTCTTTCGATCACCATAATCATCATCCCTTCAAGATTTGACGCC 
GGTGATGCATCCGCCTGTATCGCATCTCTCACCACGTTGTCTCAAGATGATCGCCTCCATTACGAATCC 
ATATCCGTCGCAAAACAACCACCAACCTCCGACCCGGATCCTGTTCCGGGTCAAGTGTACATAGAGAAA 
CAAAAGACGAAAGTGAGAGATGCAGTCGCGGCGAGAATCGTCGATCCAACAAGAAAGCTCGCGGGATTC 
GTGGTGGACATGTTCTGTTCCTCGATGATCGATGTAGCTAACGAGTTTGGAGTTCCGTGTTATATGGTA 
TACACATCGAACGCTACGTTTTTAGGAACCATGCTTCACGTTCAACAAATGTACGATCAAAAGAAGTAT 
GACGTCAGCGAGTTAGAAAACTCGGTCACCGAGTTGGAGTTTCCGTCTCTGACTCGTCCTTATCCAGTG 
AAGTGTCTTCCTCATATCCTCACTTC7\AAGGAGTGGTTACCTCTCTCTCTAGCTCAAGCTAGGTGTTTC 
CGGAAGATGAAGGGTATTTTGGTAAATACAGTTGCTGAGCTTGAACCTCACGCTTTGAAAATGTTCAAT 
ATTAATGGTGACGATCTTCCTCAAGTTTATCCTGTTGGACCAGTGTTGCATCTCGAAAACGGCAATGAC 
GATGATGAGAAGCAATCGGAAATTTTGCGGTGGCTCGACGAGCAACCGTCTAAATCTGTTGTGTTTCTC 
TGCTTTGGGAGCTTGGGAGGTTTCACTGAAGAACAAACAAGAGAAACCGCTGTGGCCCTAGATAGAAGC 
GGTCAGCGGTTTCTTTGGTGTCTTCGTCACGCATCGCCAAATATAAAAACAGATCGTCCCAGAGATTAC 
ACGAATCTTGAGGAGGTTTTACCGGAGGGGTTCTTGGAACGGACTTTGGATAGAGGGAAAGTGATTGGA 
TGGGCACCACAAGTGGCGGTACTAGAGAAGCCGGCGATAGGAGGGTTTGTCACTCACTGCGGTTGGAAC 
TCTATTTTAGAGAGCTTGTGGTTCGGTGTTCCAATGGTGACGTGGCCGCTATACGCGGAACAGAAGGTT 
AACGCGTTTGAGATGGTTGAGGAGCTGGGTTTGGCGGTGGAGATACGGAAGTACTTAAAAGGAGATTTG 
TTCGCCGGAGAGATGGAGACGGTTACCGCGGAGGATATAGAGAGAGCCATTAGGCGTGTGATGGAGCAA 
GACAGTGACGTTAGGAACAACGTGAAAGAGATGGCGGAGAAGTGCCACTTCGCGTTAATGGACGGTGGA 
TCTTCGAAGGCGGCTTTGGAAAAGTTTATTCAAGACGTGATAGAGAATATGGATTAA 
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UGT71C3 Figure 13 

ATGAAAGCAGAAGCAGAGATCATCTTCGTTACATATCCATCCCCTGGTCATCTTCTTGTCTCCATTGAA 
TTCGCTAAATCTCTCATCAAACGTGATGATCGCATCCACACCATCACCATCCTCTACTGGGCTTTACCT 
CTCGCTCCTCAAGCCCACCTTTTCGCTAAGTCCCTCGTTGCTTCACAGCCTCGAATCCGTCTCCTTGCG 
TTGCCTGATGTTCAAAACCCTCCACCATTGGAACTCTTCTTTAAAGCTCCCGAAGCTTATATTCTTGAG 
TCCACCAAGAAAACAGTTCCTTTAGTCAGAGACGCTCTCTCCACTCTAGTTTCTTCACGTAAAGAATCC 
GGTTCGGTTCGTGTAGTCGGTTTGGTTATCGATTTTTTTTGTGTTCCAATGATCGAAGTGGCAAACGAG 
CTTAACCTTCCTTCTTACATCTTCCTAACGTGTAACGCTGGGTTTTTAAGTATGATGAAGTATCTCCCT 
GAGAGACATCGCATAACCACTTCTGAGCTAGATTTAAGCTCCGGCAACGTAGAACATCCAATTCCTGGC 
TACGTCTGCTCCGTGCCGACGAAGGTTTTGCCTCCAGGTCTATTCGTGAGAGAGTCCTACGAGGCTTGG 
GTCGAGATTGCAGAGAAGTTCCCTGGAGCCAAGGGCATTTTGGTAAACTCAGTCACATGTCTTGAGCAG 
AATGCATTTGATTACTTCGCTCGTCTTGATGAGAACTATCCTCCGGTTTACCCGGTCGGACCGGTTCTT 
AGTTTGAAGGATCGTCCGTCTCCAAATCTGGACGCATCGGACCGGGATCGGATCATGAGATGGCTCGAG 
GACCAGCCGGAGTCGTCAATTGTGTATATCTGCTTCGGAAGCCTCGGAATCATTGGCAAGCTGCAGATT 
GAAGAGATAGCTGAAGCCTTGGAACTCACCGGCCACAGGTTTCTTTGGTCAATACGTACAAATCCGACG 
GAGAAAGCGAGCCCGTACGATCTGTTGCCGGAGGGATTTCTCGATCGGACGGCCAGTAAGGGATTGGTG 
TGTGATTGGGCCCCGCAAGTAGAAGTTCTGGCCCATAAAGCGCTCGGAGGATTCGTGTCTCACTGCGGT 
TGGAACTCTGTACTGGAGAGCTTATGGTTCGGTGTTCCGATCGCCACGTGGCCAATGTACGCTGAGCAA 
CAGTTAAACGCATTCTCGATGGTGAAGGAGTTAGGGTTAGCCGTGGAGCTGCGTTTAGACTACGTTTCG 
GCGTACGGAGAGATAGTAAAAGCTGAGGAGATCGCGGGAGCCATACGATCATTGATGGACGGTGAGGAT 
ACGCCGAGGAAGAGAGTGAAGGAGATGGCGGAAGCGGCGAGGAATGCTTTGATGGACGGAGGATCTTCG 
TTTGTTGCGGTTAAACGATTTCTCGACGAGTTGATCGGCGGAGATGTTTAG 
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ATGAAGACAGCAGAGCTCATATTCGTTCCTCTGCCGGAGACCGGCCATCTCTTGTCAACGATCGAGTTT 

GGAAAGCGTCTACTCAATCTAGACCGTCGGATTTCTATGATTACAATCCTCTCCATGAATCTTCCTTAC 

GCTCCTCACGCCGACGCTTCTCTTGCTTCGCTAACAGCCTCCGAGCCTGGTATCCGAATCATCAGTCTC 

CCGGAGATCCACGATCCACCTCCGATCAAGCTTCTTGACACTTCCTCCGAGACTTACATCCTCGATTTC 

ATCCATAAAAACATAGCTTGTCTCAGAAAAACCATCCAAGATTTAGTCTCATCATCATCATCTTCCGGA 

GGTGGTAGTAGTCATGTCGCCGGCTTGATTCTTGATTTCTTCTGCGTTGGTTTGATCGACATCGGCCGT 

GAGGTAAACCTTCCTTCCTATATCTTCATGACTTCCAACTTTGGTTTCTTAGGGGTTCTACAGTATCTC 

CCGGAACGACAACGTTTGACTCCGTCGGAGTTCGATGAGAGCTCCGGCGAGGAAGAGTTACATATTCCG 

GCGTTTGTGAACCGTGTTCCCGCCAAGGTTCTGCCGCCAGGTGTGTTCGATAAACTCTCTTACGGGTCT 

CTGGTCAAAATCGGCGAGCGATTACATGAAGCCAAGGGTATTTTGGTTAATTCATTTACCCAAGTGGAG 

CCTTATGCTGCTGAACATTTTTCTCAAGGACGAGATTACCCTCACGTGTATCCTGTTGGGCCGGTTCTC 

AACTTAACGGGCCGTACAAATCCGGGTCTAGCTTCGGCCCAATATAAAGAGATGATGAAGTGGCTTGAC 

GAGCAACCAGACTCGTCGGTTTTGTTCCTGTGTTTCGGGAGCATGGGAGTCTTCCCTGCACCTCAGATC 

ACAGAGATTGCTCACGCGCTCGAGCTTATCGGGTGCAGGTTCATCTGGGCGATCCGTACGAACATGGCG 

GGAGATGGCGATCCTCAGGAGCCGCTTCCAGAAGGATTTGTCGATCGAACAATGGGCCGTGGAATTGTG 

TGTAGTTGGGCTCCACAAGTGGATATCTTGGCCCACAAGGCAACAGGTGGATTCGTTTCTCACTGCGGG 

TGGAATTCCGTCCAAGAGAGTCTATGGTACGGTGTACCTATTGCAACGTGGCCAATGTATGCGGAGCAA 

CAACTGAACGCATTTGAGATGGTGAAGGAGTTGGGCTTAGCAGTGGAGATAAGGCTTGACTACGTGGCG 

GATGGTGATAGGGTTACTTTGGAGATCGTGTCAGCCGATGAAATAGCCACAGCCGTCCGATCATTGATG 

GATAGTGATAACCCCGTGAGAAAGAAGGTTATAGAAAAATCTTCAGTGGCGAGGAAAGCTGTTGGTGAT 

GGTGGGTCTTCTACGGTGGCCACATGTAATTTTATCAAAGATATTCTTGGGGATCACTTTTGA 
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ATGCGGAATGTAGAGCTCATCTTCATCCCCACACCAACCGTTGGTCATCTTGTTCCGTTTCTTGAATTT 
GCTAGGCGTCTCATTGAGCAAGATGATAGGATCCGTATCACAATCCTCTTGATGAAACTACAAGGTCAG 
TCTCATCTAGACACTTATGTTAAATCAATTGCCTCCTCTCAACCGTTTGTTAGAT^ATTGA^GTCCCT 
GAGTTAGAGGAGAAACCTACACTTGGTAGTACACAATCTGTGGAAGCTTATGTGT'VTGATGTTATTGAG 
AGAAATATCCCTCTTGTGAGGAATATAGTCATGGATATTTTAACTTCTCTTGCATTGGATGGAGTTAAG 

gtcaagggattagttgttgactttttctgtctccctatgattgacgttgctaaagatataaSc" 
ttctatgtgttcttgactacamttccgggttcttagctatgatgcagtatc^ 
agagatacatcggtttttgtaagaaactcggaagaaatgttgtcgatacctggatttgtaIa^ 
ccagccaatgttctgccgtcagctctgtttgttgaagatggttatgatgcttacgtt^ 

TTGTTTACAAAGGCCAATGGAATCCTAGTGAATAGCTCCTTTGATATTGAGCCTTACrcTGTGA^TCAT 
TTTCTTCAAGAACAGAATTATCCTTCTGTTTATGCTGTTGGCCCCATATTTGACTTGAAAGCCC^GCCT 
CATCCAGAGCAGGACCTAACCCGTCGTGACGAGTTGATGAAATGGCTTGATGATCAAC^CGAGG^ 

gttgtattcctttgttttgggagtatggcaaggttaagaggttctctagtgaaggSS 

CTTGAGCTATGTCAATATAGATTCCTCTGGTCACTCCGTAAAGAAGAGGTGACAAAGGATGATTTGCCA 
GAGGGGTTCCTTGACCGTGTCGATGGACGTGGAATGATATGTGGTTGGTCTCCTCAGGTAGA^ATACTG 

GGCGTGCCAATTGTGACATGGCCAATGTATGCAGAGCAACAACTCAATGCGTTTCTGATGGTGAAGGAA 
CTGAAGCTAGCTGTGGAGCTGAAGCTTGATTACAGGGTACATAGTGATGAGATAGTAAACGCA^ACGAG 
ATAGAGACCGCTATTCGTTATGTAATGGACACGGATAATAATGTTGTGAGGAAACGAGTGATGGATAT^ 

GACSS^ 
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ATGGGAACTCCTGTCGAAGTCTCTAAGCTCCATTTCTTGCTCTTCCCTTTCATGGCTCATGGCCATATG 

ATACCAACTCTAGACATGGCTAAGCTCTTTGCCACCAAAGGAGCTAAATCCACTATCCTCACTACACCT 

CTCAATGCCAAGCTCTTCTTCGAGAAACCCATCAAATCATTCAACCAAGACAACCCGGGACTCGAAGAC 

ATCACCATCCAGATCCTTAATTTCCCTTGCACAGAGCTTGGTTTGCCTGATGGCTGTGAGAATACTGAT 

TTCATCTTCTCCACACCTGACCTAAACGTAGGTGACTTGAGTCAAAAGTTTTTACTCGCAATGAAATAT 

TTCGAAGAGCCACTAGAGGAGCTCCTCGTGACAATGAGACCAGACTGTCTTGTCGGTAACATGTTCTTC 

CCTTGGTCCACTAAAGTTGCTGAGAAGTTCGGAGTACCGAGACTTGTGTTCCACGGCACAGGCTACTTC 

TCTTTATGTGCTTCTCATTGCATAAGGCTCCCTAAGAATGTGGCAACAAGTTCTGAGCCCTTTGTGATT 

CCTGATCTCCCGGGAGACATTTTGATTACAGAGGAACAGGTCATGGAGACAGAAGAAGAGTCTGTAATG 

GGGAGGTTTATGAAGGCAATAAGAGACTCAGAGAGAGATAGCTTTGGCGTGTTGGTGAACAGCTTCTAC 

GAGCTTGAACAGGCTTACTCAGATTATTTCAAGAGCTTTGTGGCGAAAAGAGCGTGGCATATCGGTCCG 

CTTTCCTTAGGAAATAGAAAGTTCGAGGAGAAAGCAGAAAGAGGCAAAAAGGCAAGCATTGATGAGCAT 

GAATGTTTGAAATGGCTCGACTCCAAGAAATGTGATTCAGTGATTTACATGGCCTTTGGAACCATGTCT 

AGCTTTAAAAACGAGCAGCTGATAGAGATTGCAGCTGGTTTAGATATGTCAGGACATGATTTTGTCTGG 

GTGGTTAACAGAAAAGGCAGCCAAGGTACCATAGACATCACTCTCTTTGCAGCAAAATCCTCTGTTTTT 

GTTTTAGAGAAAAACCAATGATCTAATTAGGATTCTACTGTTTCAAACTCTAACTTTTGCGTTTGCATT 

ACATATAAATAGTTGAGAAGGAAGATTGGTTACCAGAGGGGTTTGAAGAGAAGACCAAGGGAAAAGGAT 

TGATAATCCGAGGGTGGGCGCCACAAGTGCTGATACTTGAGCACAAAGCAATTGGCGGATTTTTGACGC 

ATTGTGGATGGAACTCGTTATTAGAAGGGGTGGCAGCGGGCCTGCCAATGGTGACATGGCCCGTGGGAG 

CCGAGCAGTTCTACAACGAGAAATTGGTGACACAAGTGTTGAAAACAGGAGTGAGTGTGGGAGTGAAGA 

AGATGATGCAAGTAGTTGGAGACTTCATTAGCAGAGAGAAAGTGGAGGGAGCGGTGAGGGAAGTGATGG 

TTGGAGAAGAGAGGAGGAAACGGGCCAAGGAGTTAGCAGAAATGGCGAAAAATGCGGTGAAAGAAGGAG 

GATCTTCAGATCTAGAGGTAGATAGGTTGATGGAAGAGCTTACGTTAGTTAAACTGCAAAAAGAGAAGG 
TATAA 
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ATGGGTAGTGATCATCATCATCGAAAGCTCCACGTTATGTTCTTCCCTTTCATGGCTTATGGTCACATG 

ATACCAACTCTAGACATGGCTAAGCTTTTCTCTAGCAGAGGAGCCAAATCCACAATCCTCACCACATCT 

CTCAACTCCAAGATCCTCCAAAAACCCATCGACACATTCAAGAATCTGAATCCGGGTCTCGAAATCGAC 

ATCCAGATCTTCAATTTCCCTTGCGTGGAGCTGGGGTTACCAGAAGGATGTGAAAACGTTGATTTCTTC 

ACTTeAAACAACAATGATGATAAAAACGAGATGATCGTGAAATTCTTTTTCTCGACAAGGTTTTTCAAA 

GACCAGCTTGAGAAACTCCTCGGGACAACGAGACCAGACTGTCTTATCGCCGACATGTTCTTCCCCTGG 

GCTACTGAAGCTGCTGGGAAGTTCAATGTGCCAAGACTTGTGTTCCACGGCACTGGCTACTTCTCTTTA 

TGCGCTGGTTATTGCATCGGAGTGCATAAACCACAGAAGAGAGTGGCTTCAAGCTCTGAGCCATTTGTG 

ATTCCCGAGCTCCCTGGGAACATTGTGATAACTGAAGAACAGATCATAGATGGCGATGGAGAATCCGAC 

ATGGGAAAGTTTATGACTGAAGTTAGGGAATCGGAAGTGAAGAGCTCAGGAGTTGTTTTGAATAGTTTC 

TACGAGCTAGAACATGATTACGCCGATTTTTACAAAAGTTGTGTACAAAAGAGAGCGTGGCATATCGGT 

CCGCTATCGGTTTACAACAGGGGATTTGAGGAGAAGGCTGAGAGAGGAAAGAAAGCGAACATTGATGAG 

GCTGAATGCCTCAAATGGCTTGACTCCAAGAAACCAAATTCAGTCATTTATGTTTCCTTTGGGAGCGTG 

GCTTTCTTCAAGAATGAACAGTTATTCGAGATCGCTGCAGGGTTAGAAGCTTCCGGTACAAGTTTCATT 

TGGGTTGTTAGGAAAACCAAAGGTATTGAAATTGACGTTTGAAGCCTATATTATATAGCTGTAATTTGG 

GTAGCTTTGATTTTAATCTGACACAAGATTTGGTGTGAACAGATGATAGAGAAGAATGGTTACCAGAAG 

GGTTCGAAGAGAGGGTGAAAGGGAAAGGTATGATAATAAGAGGATGGGCACCACAGGTGCTGATACTTG 

ACCACCAAGCAACCGGTGGGTTTGTGACCCATTGCGGCTGGAACTCGCTTCTTGAAGGAGTGGCTGCAG 

GGCTACCAATGGTGACATGGCCTGTAGGAGCGGAGCAATTCTACAATGAGAAATTGGTTACGCAAGTGC 

TCAGAACAGGAGTGAGCGTGGGAGCGAGCAAGCATATGAAAGTTATGATGGGAGATTTCATTAGCAGAG 

AGAAAGTGGATAAAGCGGTGAGGGAGGTTTTGGCTGGGGAAGCAGCAGAGGAGAGGCGGAGACGGGCAA 

AGAAGCTAGCGGCGATGGCTAAAGCTGCCGTGGAAGAAGGAGGGTCTTCCTTCAACGATCTAAACAGCT 
TCATGGAAGAGTTTAGTTCATAA 
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ATGAACAGAGAGCAAATTCATATTTTGTTCTTCCCCTTCATGGCTCATGGCCACATGATTCCACTCTTA 
GACATGGCCAAGCTTTTCGCTAGAAGAGGAGCCAAATCAACTCTCCTCACAACCCCAATAAATGCTAAG 
ATCTTGGAGAAACCCATTGAAGCATTCAAAGTTCAAAATCCTGATCTCGAAATCGGAATCAAGATCCTC 
AATTTCCCTTGTGTAGAGCTTGGATTGCCAGAAGGATGCGAGAACCGTGACTTCATTAACTCATACCAA 
AAATCTGACTCATTTGACTTGTTCTTGAAGTTTCTTTTCTCTACCAAGTATATGAAACAGCAGTTGGAG 
AGTTTCATTGAAACAACCAAACCGAGTGCTCTTGTAGCCGATATGTTCTTCCCTTGGGCAACAGAATCC 
GCGGAGAAGATCGGTGTTCCAAGACTTGTGTTCCACGGCACATCATCCTTTGCCTTGTGTTGTTCGTAT 
AACATGAGGATTCATAAGCCACACAAGAAAGTCGCTTCGAGTTCTACTCCATTTGTAATCCCTGGTCTC 
CCTGGAGACATAGTTATTACAGAAGACCAAGCCAATGTCACCAACGAAGAAACTCCATTCGGAAAGTTT 
TGGAAAGAAGTCAGGGAATCAGAGACCAGTAGCTTTGGTGTTTTGGTGAATAGCTTCTACGAGCTGGAA 
TCATCTTATGCTGATTTTTACCGTAGTTTTGTGGCGAAAAAAGCGTGGCATATAGGTCCACTTTCACTA 
TCCAACAGAGGGATTGCAGAGAAAGCCGGAAGAGGGAAAAAGGCAAACATTGATGAGCAAGAATGCCTC 
AAATGGCTTGACTCTAAGACACCTGGCTCAGTAGTTTACTTGTCCTTTGGTAGCGGAACCGGCTTACCC 
AACGAACAGCTGTTAGAGATTGCTTTCGGCCTTGAAGGCTCTGGACAAAATTTCATTTGGGTGGTTAGC 
AAAAATGAAAACCAAGGTAATTTTTTTCCTCCTTAACCATTATTAATCAATGTAGTCTTTATTAGTATA 

tttccaaaaatattaacatttgtgtatacattttcctattgccaaatatgctatgatgccataGcaatg 

AGTAGATTGGTTTGTGTACTTTATATATTACTTTGTAGAACTTCTAACAATTATGACTTGGTGTTGGTG 
TAGTTGGGACAGGTGAAAATGAAGATTGGTTGCCTAAAGGGTTTGAAGAGAGGAATAAAGGAAAAGGGC 
TGATAATACGCGGATGGGCCCCGCAAGTGCTGATACTTGACCACAAAGCAATCGGAGGATTTGTGACGC 
ATTGCGGATGGAACTCGACTTTGGAGGGCATTGCCGCAGGGCTGCCTATGGTGACTTGGCCGATGGGGG 
CAGAACAGTTCTACAACGAGAAGTTATTGACAAAAGTGTTGAGAATAGGAGTGAACGTTGGAGCTACCG 
AGTTGGTGAAAAAAGGAAAGTTGATTAGTAGAGCACAAGTGGAGAAGGCAGTAAGGGAAGTGATTGGTG 
GTGAGAAGGCAGAGGAAAGGCGGCTAAGGGCTAAGGAGCTGGGCGAGATGGCTAAAGCCGCTGTGGAAG 
AAGGAGGGTCTTCTTATAATGATGTGAACAAGTTTATGGAAGAGCTGAATGGTAGAAAGTAG 
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ATGAACAGAGAAGTCTCTGAGAGAATTCATATTTTGTTCTTCCCCTTCATGGCTCAAGGCCACATGATT 

CCAATTTTGGACATGGCCAAGCTTTTCTCGAGGAGAGGAGCCAAGTCAACCCTTCTCACAACCCCAATC 

AACGCTAAGATCTTCGAGAAACCTATTGAAGCATTCAAAAATCAAAACCCTGATCTCGAAATCGGAATC 

AAGATCTTCAATTTCCCTTGTGTAGAGCTTGGATTGCCTGAAGGATGCGAGAACGCTGACTTTATCAAC 

TCATACCAAAAATCTGACTCAGGTGACTTGTTCTTGAAGTTTCTTTTCTCTACCAAGTATATGAAACAA 

CAGTTGGAGAGTTTCATTGAAACAACCAAACCAAGTGCTCTTGTTGCCGATATGTTCTTCCCTTGGGCG 

ACAGAATCTGCTGAGAAGCTCGGTGTACCAAGACTTGTGTTCCACGGTACATCTTTCTTTTCTTTGTGT 

TGTTCGTATAACATGAGGATTCATAAGCCACACAAGAAAGTCGCTACGAGTTCTACTCCTTTTGTAATC 

CCTGGTCTCCCAGGAGACATAGTTATTACAGAAGACCAAGCCAATGTTGCCAAAGAAGAAACGCCAATG 

GGAAAGTTTATGAAAGAGGTTAGGGAATCAGAGACCAATAGCTTTGGTGTATTGGTTAATAGCTTCTAC 

GAGCTGGAATCAGCTTATGCTGATTTTTATCGTAGTTTTGTGGCGAAAAGAGCTTGGCATATCGGTCCG 

CTTTCGCTATCTAACAGAGAGTTAGGAGAGAAAGCCAGAAGAGGGAAAAAGGCTAACATTGATGAGCAA 

GAATGCCTAAAATGGCTGGACTCTAAGACACCTGGTTCAGTAGTTTACTTGTCCTTTGGGAGCGGAACT 

AATTTCACCAACGACCAGCTGTTAGAGATCGCTTTTGGTCTTGAAGGTTCTGGACAAAGTTTCATCTGG 

GTGGTTAGGAAAAATGAAAACCAAGGTAAATTGTTTCTCCCCAGCCATTATTAACCAACATAGTAATGT 

TAATATTTGTGTATATATTCGTATTGCCAAATATGCTCTGATACCATGGCAAGTAATAGATTGGCTCAT 

GTATTTTATTTGTGATCATGTAGAATTTTCTTAACAGTTATGACTTGGTGTTGGTATGGTTGGGACAGG 

TGACAATGAAGAGTGGTTGCCTGAAGGGTTTAAAGAGAGGACAACAGGGAAAGGGCTAATAATACCTGG 

ATGGGCGCCGCAAGTGCTGATACTTGACCATAAAGCAATTGGAGGATTTGTGACTCATTGCGGATGGAA 

CTCGGCTATAGAGGGCATTGCCGCGGGGCTGCCTATGGTAACATGGCCAATGGGGGCAGAACAGTTCTA 

CAATGAGAAGCTATTGACAAAAGTGTTGAGAATAGGAGTGAACGTTGGAGCTACCGAGTTGGTGAAAAA 

AGGAAAGTTGATTAGTAGAGCACAAGTGGAGAAGGCAGTAAGGGAAGTGATTGGTGGTGAGAAGGCAGA 

GGAAAGGCGGCTATGGGCTAAGAAGCTGGGCGAGATGGCTAAAGCCGCTGTGGAAGAAGGAGGGTCCTC 

TTATAATGATGTGAACAAGTTTATGGAAGAGCTGAATGGTAGAAAGTAG wuw**W*TCCTC 
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™^™r cGTccTc ^ 

CCAATGGTAGATATTGCAAGGCTCCTGGCTCAGCGCGGGGTGACTATAACCATTGTCACTACACCTCAA 
AACGCAGGCCGGTTCAAGAACGTTCTTAGCCGGGCTATCCAATCCGGCTTGCCCATCAATCTCGTGCAA 
GTAAAGTTTCCATCTCAAGAATCGGGTTCACCGGAAGGACAGGAGAATTTGGACTTGCTCGATTCATTG 
GGGGCTTCATTAACCTTCTTCAAAGCATTTAGCCTGCTCGAGGAACCAGTCGAGAAGCTCTTGAAAGAG 
ATTCAACCTAGGCCAAACTGCATAATCGCTGACATGTGTTTGCCTTATACAAACAGAATTGCCAAGAAT 
CTTGGTATACCAAAAATCATCTTTCATGGCATGTGTTGCTTCAATCTTCTTTGTACGCACATAATGCAC 
CAAAACCACGAGTTCTTGGAAACTATAGAGTCTGACAAGGAATACTTCCCCATTCCTAATTTCCCTGAC 

a s g ij gag ii caca ^ 

GGAATGACAGAAGGGGATAACACTTCTTATGGTGTGATTGTTAACACGTTTGAAGAGCTCGAGCCAGCT 
TATGTTAGAGACTACAAGAAGGTTAAAGCGGGTAAGATATGGAGCATCGGACCGGTTTCCTTGTGCAAC 
AAGTTAGGAGAAGACCAAGCTGAGAGGGGAAACAAGGCGGACATTGATCAAGACGAGTGTATTAAATGG 
CTTGATTCTAAAGAAGAAGGGTCGGTGCTATATGTTTGCCTTGGAAGTATATGCAATCTTCCTCTGTCT 
CAGCTCAAAGAGCTCGGCTTAGGCCTCGAGGAATCCCAAAGACCTTTCATTTGGGTCATAAGAGGTTGG 
GA ^f AT ^ CGAGTTACTTGAATGGATCTCAGAGAGC GGTTATAAGGA^ 

CTTCTCATAACAGGATGGTCGCCTCAAATGCTTATCCTTACACATCCTGCCGTTGGAGGATTCTTGACA 
GGAGAGCAA II CTGCAATGAGAAATTGGCG GTGCAGATACTAAAAGC 

GAG I CCA r G f ATGGGGAGAAGAGGAG ^TAGGAGTACTGGTGGATAAAGAAGGAGTAAAG^ 
GTGGAGGAATTGATGGGTGATAGTAATGATGCTAAGGAGAGAAGAAAAAGAGTGAAAGAGCTTGGAGAA 
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ATGGCTACGGAAAAAACCCACCAATTTCATCCTTCTCTTCACTTTGTCCTCTTCCCTTTCATGGCTCAA 

GGCCACATGATTCCCATGATTGATATTGCAAGACTCTTGGCTCAGCGTGGTGTGACCATAACAATTGTC 

ACGACACCTCACAACGCAGCAAGGTTTAAGAATGTCCTAAACCGAGCGATCGAGTCTGGCTTGGCCATC 

AACATACTGCATGTGAAGTTTCCATATCAAGAGTTTGGTTTGCCAGAAGGAAAAGAGAATATAGATTCG 

TTAGACTCAACGGAGTTGATGGTACCTTTCTTCAAAGCGGTGAACTTGCTTGAAGATCCGGTCATGAAG 

CTCATGGAAGAGATGAAACCTAGACCTAGCTGTCTAATTTCTGATTGGTGTTTGCCTTATACAAGCATA 

ATCGCCAAGAACTTCAATATACCAAAGATAGTTTTCCACGGCATGGGTTGCTTTAATCTTTTGTGTATG 

CATGTTCTACGCAGAAACTTAGAGATCCTAGAGAATGTAAAGTCGGATGAAGAGTATTTCTTGGTTCCT 

AGTTTTCCTGATAGAGTTGAATTTACAAAGCTTCAACTTCCTGTGAAAGCAAATGCAAGTGGAGATTGG 

AAAGAGATAATGGATGAAATGGTAAAAGCAGAATACACATCCTATGGTGTGATCGTCAACACATTTCAG 

GAGTTGGAGCCACCTTATGTCAAAGACTACAAAGAGGCAATGGATGGAAAAGTATGGTCCATTGGACCC 

GTTTCCTTGTGTAACAAGGCAGGTGCAGACAAAGCTGAGAGGGGAAGCAAGGCCGCCATTGATCAAGAT 

GAGTGTCTTCAATGGCTTGATTCTAAAGAAGAAGGTTCGGTGCTCTATGTTTGCCTTGGAAGTATATGT 

AATCTTCCTTTGTCTCAGCTCAAGGAGCTGGGGCTAGGCCTTGAGGAATCTCGAAGATCTTTTATTTGG 

GTCATAAGAGGTTCGGAAAAGTATAAAGAACTATTTGAGTGGATGTTGGAGAGCGGTTTTGAAGAAAGA 

ATCAAAGAGAGAGGACTTCTCATTAAAGGGTGGGCACCTCAAGTCCTTATCCTTTCACATCCTTCCGTT 

GGAGGATTCCTGACACACTGTGGATGGAACTCGACTCTCGAAGGAATCACCTCAGGCATTCCACTGATC 

ACTTGGCCGCTGTTTGGAGACCAATTCTGCAACCAAAAACTGGTCGTTCAAGTACTAAAAGCCGGTGTA 

AGTGCCGGGGTTGAAGAAGTCATGAAATGGGGAGAAGAAGATAAAATAGGAGTGTTAGTGGATAAAGAA 

GGAGTGAAAAAGGCTGTGGAAGAATTGATGGGTGATAGTGATGATGCAAAAGAGAGGAGAAGAAGAGTC 

AAAGAGCTTGGAGAATTAGCTCACAAAGCTGTGGAAAAAGGAGGCTCTTCTCATTCTAACATCACACTC 
T TGC T AC AAG AC AT AATG CAAC T AGC AC AAT TC AAG AAT TGA ' "~ ' 
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ATGGTTTCCGAAACAACCAAATCTTCTCCACTTCACTTTGTTCTCTTCCCTTTCATGGCTCAAGGCCAC 

ATGATTCCCATGGTTGATATTGCAAGGCTCTTGGCTCAGCGTGGTGTGATCATAACAATTGTCACGACG 

CCTCACAATGCAGCGAGGTTCAAGAATGTCCTAAACCGTGCCATTGAGTCTGGCTTGCCCATCAACTTA 

GTGCAAGTCAAGTTTCCATATCTAGAAGCTGGTTTGCAAGAAGGACAAGAGAATATCGATTCTCTTGAC 

ACAATGGAGCGGATGATACCTTTCTTTAAAGCGGTTAACTTTCTCGAAGAACCAGTCCAGAAGCTCATT 

GAAGAGATGAACCCTCGACCAAGCTGTCTAATTTCTGATTTTTGTTTGCCTTATACAAGCAAAATCGCC 

AAGAAGTTCAATATCCCAAAGATCCTCTTCCATGGCATGGGTTGCTTTTGTCTTCTGTGTATGCATGTT 

TTACGCAAGAACCGTGAGATCTTGGACAATTTAAAGTCAGATAAGGAGCTTTTCACTGTTCCTGATTTT 

CCTGATAGAGTTGAATTCACAAGAACGCAAGTTCCGGTAGAAACATATGTTCCAGCTGGAGACTGGAAA 

GATATCTTTGATGGTATGGTAGAAGCGAATGAGACATCTTATGGTGTGATCGTCAACTCATTTCAAGAG 

CTCGAGCCTGCTTATGCCAAAGACTACAAGGAGGTAAGGTCCGGTAAAGCATGGACCATTGGACCCGTT 

TCCTTGTGCAACAAGGTAGGAGCCGACAAAGCAGAGAGGGGAAACAAATCAGACATTGATCAAGATGAG 

TGCCTTAAATGGCTCGATTCTAAGAAACATGGCTCGGTGCTTTACGTTTGTCTTGGAAGTATCTGTAAT 

CTTCCTTTGTCTCAACTCAAGGAGCTGGGACTAGGCCTAGAGGAATCCCAAAGACCTTTCATTTGGGTC 

ATAAGAGGTTGGGAGAAGTACAAAGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTGAAGATAGAATC 

CAAGATAGAGGACTTCTCATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTCACATCCATCAGTTGGA 

GGGTTCCTAACACACTGTGGTTGGAACTCGACTCTTGAGGGGATAACTGCTGGTCTACCGCTACTTACA 

TGGCCGCTATTCGCAGACCAATTCTGCAATGAGAAATTGGTCGTTGAGGTACTAAAAGCCGGTGTAAGA 

TCCGGGGTTGAACAGCCTATGAAATGGGGAGAAGAGGAGAAAATAGGAGTGTTGGTGGATAAAGAAGGA 

GTGAAGAAGGCAGTGGAAGAATTAATGGGTGAGAGTGATGATGCAAAAGAGAGAAGAAGAAGAGCCAAA 

GAGCTTGGAGATTCAGCTCACAAGGCTGTGGAAGAAGGAGGCTCTTCTCATTCTAACATCTCTTTCTTG 

CTACAAGACATAATGGAACTGGCAGAACCCAATAATTGA 
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ATGGCTTTCGAAAAAAACAACGAACCTTTTCCTCTTCACTTTGTTCTCTTCCCTTTCATGGCTCAAGGC 
CACATGATTCCCATGGTTGATATTGCAAGGCTCTTGGCTCAGCGAGGTGTGCTTATAACAATTGTCACG 
ACGCCTCACAATGCAGCAAGGTTCAAGAATGTCCTAAACCGTGCCATTGAGTCTGGTTTGCCCA-TCAAC 
CTAGTGCAAGTCAAGTTTCCATATCAAGAAGCTGGTCTGCAAGAAGGACAAGAAAATATGGATTTGCTT 
ACCACGATGGAGCAGATAACATCTTTCTTTAAAGCGGTTAACTTACTCAAAGAACCAGTCCAGAACCTT 
ATTGAAGAGATGAGCCCGCGACCAAGCTGTCTAATCTCTGATATGTGTTTGTCGTATACAAGCGAAATC 
GCCAAGAAGTTCAAAATACCAAAGATCCTCTTCCATGGCATGGGTTGCTTTTGTCTTCTGTGTGTTAAC 
GTTCTGCGCAAGAACCGTGAGATCTTGGACAATTTAAAGTCTGATAAGGAGTACTTCATTGTTCCTTAT 
TTTCCTGATAGAGTTGAATTCACAAGACCTCAAGTTCCGGTGGAAACATATGTTCCTGCAGGCTGGAAA 
GAGATCTTGGAGGATATGGTAGAAGCGGATAAGACATCTTATGGTGTTATAGTCAACTCATTTCAAGAG 
CTCGAACCTGCGTATGCCAAAGACTTCAAGGAGGCAAGGTCTGGTAAAGCATGGACCATTGGACCTGTT 

TGCCTTGAATGGCTCGATTCTAAGGAACCGGGATCTGTGCTCTACGTTTGCCTTGGAAGTATTTGTAAT 
CTTCCTCTGTCTCAGCTCCTTGAGCTGGGACTAGGCCTAGAGGAATCCCAAAGACCTTTCATCTGGGTC 
ATAAGAGGTTGGGAGAAATACAAAGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTGAAGATAGAATC 
CAAGATAGAGGACTTCTCATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTCACATCCTTCTGTTGGA 
GGGTTCTTAACGCACTGCGGATGGAACTCGACTCTTGAGGGGATAACTGCTGGTCTACCAATGCTTACA 
TGGCCACTATTTGCAGACCAATTCTGCAACGAGAAACTGGTCGTACAMTACTAAAAGTCGGTGTAAGT 

GTGAAGAAGGCAGTGGAAGAACTAATGGGTGAGAGTGATGATGCAAAAGAGAGAAGAAGAAGAGCCAAA 

GAGCTTGGAGAATCAGCTCACAAGGCTGTGGAAGAAGGAGGCTCCTCTCATTCTAATATCACTTTCTTG 
CTACAAGACATAATGCAACTAGCACAGTCCAATAATTGA v.i/uiiHii,AtJ ITCTTG 
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ATGTGTTCTCATGATCCTCTTCACTTCGTCGTAATACCCTTTATGGCCCAAGGCCATATGATCCCATTG 
GTCGACATCTCTAGGCTCTTGTCCCAG^^ 

gtagccaagatcaagacttcactctcattttcctctttgt™^ 

ITTCTGTCTCAACAAACGGGTTTGCCAGAAGGGTGCGAGAGTTTAGATATG^ 
JTGGTGAAGTTCTTTGATGCTGCCAACTCACTTGAGGAGCAAGTTGAG^ 

cagccgcggccaagctgcatcattggagacatgagccttcctttcacttca^ 
aagatccccaaacttatcttccatgggttttcttgtttcagcctcatgtc^tacaIgtg^^Ia 

GAGpCACGAAACCTCAGGTCTCTGTGTTGCAACCTGTTGAAGGAAATATGAAAGS 

ATTATTGAAGCTGATAATGACTCTTATGGTGTTATTGTGAACACTTTTGAAG^ 

GCAAGAGAATATAGGAAAGCAAGGGCTGGAAAAGTTTGGTGCGTTGGAC^ 

ttagggttagacaaagctaaaagaggagataaggcttctattggtcaagaccaatgS 

GACTCTCAAGAAACTGGTTCAGTGCTCTACGTTTGCCTTGGAAGTCTA^ 

ctcaaagagctgggactaggccttgaggcatctaatamcctttcatatg^ 

AAATATGGAGATTTAGCAAATTGGATGCAACAAAGCGGATTTGAAGAGCGGATC^ 

gtgatcaaaggttgggcgccgcaagttttcatccto^ 

tgtggatggaactcgacactagaaggaattactgcaggagttccattattgacatgS 
gaacaattcttgaatgagaagttagttgtgcagatactaa^gcag^ 

ttgatgaaatatggaaaagaagaggagataggagcgatggtgagcagagaa^tgtgagS 
gatgagctaatgggtgatagtgmgaagcagaagagagaagaagaaaag^acagS 
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UGT74F2 Figure 25 

. ATGGAGCATAAGAGAGGACATGTATTAGCAGTGCCGTACCCAACGCAAGGACACATCACACCATTCCGC 
CAATTCTGCAAACGACTTCACTTCAAAGGTCTCAAAACCACTCTCGCTCTCACCACTTTCGTCTTCAAC 

TCCATCAATCCTGACCTATCCGGTCCAATCTCCATAGCCACCATCTCCGATGGCTATGACCATGGGGGT 
TTCGAGACAGCTGACTCCATCGACGACTACCTCAAAGACTTTAA^ 

GACATCATCCAAAAACACCAGACTAGTGATAACCCCATCACTTGTATCGTCTATGATGCTTTCCTGCCT 

TGGGCACTTGACGTTGCTAGAGAGTTTGGTTTAGTTGCGACTCCTTTCTTTACGCAGCCTTGTGCTGTT 
AACTATGTTTATTATCTTTCTTACATAAACAA^ 

CTTGAGCTCCAAGATTTGCCTTCTTTCTTCTCTGTTTCTGGCTCTTATCCTG^CTTACT^^GATGGTG 
CTTCAACAGTTCATAAATTTCGAA^^ 

CATGTTAGATCTCTCTCTATCTCTTTCTTACAATTCTTAAACCATCTCTTGTTCTTGTGCATGTACTAA 
CTGCTCTTTTTTTGTTTACAGGAGAATGAATTGTGGTCGAAAGCTTGTCCTGTGTTGACAATTGGTCCA 

TCGAAAGATGATTCCTTCTGCATTAACTGGCTCGACACAAGGCCACAAGGGTCGGTGGTGTACGTAGC^ 

TTCCTGTGGGTGGTCAGATCTTCAGAGGAGGAAAAACTCCCATCAGGGTTTCTTGAGACAGTGAATAAA 

^^ gcttggtcttg ^ tggagtc ctcagcttcaagttctgtcaaacaJgS^ 

TTGACTCACTGTGGCTGGAACTCAACCATGGAGGCTTTGACCTTCGGGGTTCCCATGGTGGCAATGCCC 

caatggactgatcaaccgatgaacgcaaagtacatacaagatgtgtggaaggctS 

gagaggagcaaagagatgaagaagaacgtgaagaaatggagagacttggctgtcaagtcactcaatgaa 
ggaggttctacggatactaacattgatacatttgtatcaagggttcagagcaaatag 
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UGT7 6E1 Figure 26 

ATGGAAGAACTAGGAGTGAAGAGAAGGATAGTATTGGTTCCAGTTCCAGCACAAGGTCATGTAACTCCG 
ATTATGCAACTCGGGAAGGCTCTTTACTCCAAGGGCTTCTCCATCACTGTTGTTCTCACACAGTATAAT 
CGAGTTAGCTCATCCAAGGACTTCTCTGATTTTCATTTCCTCACCATCCCAGGCAGCTTGACCGAGTCT 
GATCTCAAAAACCTTGGACCATTCAAGTTTCTCTTCAAGC-TCAATCAAATTTGCGAGGCAAGCTTCAAG 
CAATGTATTGGTCAACTATTGCAGGAGCAAGGTAATGATATCGCTTGTGTCGTCTACGATGAGTACATG 
TACTTCTCCCAAGCTGCAGTTAAAGAGTTTCAACTTCCTAGCGTCCTCTTCAGCACGACAAGTGCTACT 
GCCTTTGTCTGTCGCTCTGTTTTGTCTAGAGTCAACGCAGAGTCATTCTTGCTTGACATGAAAGGTACT 
CAAGATTTTTTAGCTTGTTAACTCAAACTTTAAAAGTGCATTTAGGTATATAAACCAATCCAAATGCTG 
TTGTTTGCTTTGCAGATCCCAAAGTGTCAGACAAGGAATTTCCAGGGTTGCATCCGCTAAGGTACAAGG 
ACCTGCCAACTTCAGCATTTGGGCCATTAGAGAGTATACTCAAGGTTTACAGTGAGACTGTCAACATTC 
GAACAGCTTCGGCAGTTATCATCAACTCAACAAGCTGTCTAGAGAGCTCATCTTTGGCATGGTTACAAA 
AACAACTGCAAGTTCCAGTGTATCCTATAGGCCCACTTCACATTGCAGCTTCAGCGCCTTCTAGTTTAC 
TTGAAGAGGACAGGAGTTGCCTTGAGTGGTTGAACAAGCAAAAAATAGGCTCAGTGATTTACATAAGTT 
TGGGAAGCTTGGCTCTAATGGAAACTAAAGACATGTTGGAGATGGCTTGGGGTTTACGTAATAGCAACC 
AACCTTTCTTATGGGTGATCCGACCGGGTTCTATTCCCGGCTCGGAATGGACAGAGTCTTTACCGGAGG 
AATTCAGTAGGTTGGTTTCAGAAAGAGGTTACATTGTGAAATGGGCACCACAGATAGAAGTTCTCAGAC 
ATCCTGCAGTGGGAGGGTTTTGGAGTCACTGCGGATGGAACTCGACCCTAGAGAGCATCGGGG'AAGGAG 
TTCCGATGATCTGTAGGCCTTTTACGGGAGATCAGAAAGTCAATGCGAGGTACTTAGAGAGAGTTTGGA 
GAATTGGGGTTCAATTGGAAGGAGAGCTGGATAAAGGAACAGTGGAGAGAGCTGTAGAGAGATTGATTA 
TGGATGAAGAAGGAGCAGAAATGAGGAAGAGAGTTATCAACTTGAAAGAGAAGCTTCAAGCCTCTGTCA 

AGAGTAGAGGTTCCTCATTCAGCTCATTAGACAACTTTGTCAATTCCTTAAAAATGATGAATTTCATGT 
AG 
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UGT76E11 Figure 27 

ATGGAGGAAAAGCCGGCGGGCAGAAGAGTAGTGTTGGTTGCAGTTCCAGCTCAAGGACATATCTCTCCA 

ATAATGCAACTTGCAAAAACACTTCACTTGAAGGGTTTCTCAATCACAATCGCTCAGACAAAGTTCAAT 

TACTTTAGCCCTTCAGATGACTTCACTGATTTTCAGTTTGTCACCATTCCAGAAAGCTTACCAGAGTCT 

GATTTTGAGGATCTCGGGCCAATAGAGTTTCTGCATAAGCTCAACAAAGAGTGTCAGGTGAGCTTCAAA 

GACTGTTTGGGTCAGTTGTTGCTGCAACAAGGTAATGAGATAGCCTGTGTTGTCTACGACGAGTTCATG 

TACTTTGCTGAAGCTGCAGCCAAAGAGTTTAAGCTTCCAAACGTCATTTTCAGCACCACAAGTGCCACG 

GCTTTTGTTTGCCGCTCTGCATTCGACAAACTTTATGCAAACAGTATCCTGACTCCCTTGAAAGGTACT 

CTTGAATTCTCTGTCTTCTATTCTTGCTGGTTTCTATAATCTGTAACAGCATGGTTCTTGACCTTTTTG 

CAGAACCCAAAGGACAACAAAACGAGCTAGTGCCAGAGTTTCATCCCCTGAGATGCAAAGACT T TCCGG 

TTTCACATTGGGCATCATTAGAAAGCATGATGGAGCTGTATAGGAATACAGTTGACAAACGGACAGCTT 

CCTCGGTGATAATCAACACAGCGAGCTGTCTAGAGAGCTCATCTCTGTCTCGTCTGCAGCAACAGCTAC 

AAATTCCAGTTTATCCTATAGGCCCTCTTCACCTGGTGGCATCAGCTTCTACGAGTCTTCTTGAAGAGA 

ACAAGAGCTGTATTGAATGGTTGAACAAACAAAAGAAAAACTCTGTGATATTCGTAAGCTTGGGAAGCT 

TAGCTTTGATGGAAATCAATGAGGTGATAGAAACTGCTTTGGGATTGGATAGTAGCAAGCAACAGTTCT 

TGTGGGTCATTCGGCCAGGGTCAGTACGTGGTTCGGAATGGATAGAGAACTTGCCTAAGGAGTTTAGTA 

AGATAATTTCGGGTCGAGGTTACATTGTGAAATGGGCTCCACAGAAGGAAGTACTTTCTCATCCTGCAG 

TAGGAGGATTTTGGAGCCATTGCGGATGGAACTCGACACTAGAGAGCATCGGGGAAGGAGTTCCAATGA 

TTTGCAAGCCGTTTTCCAGTGATCAAATGGTGAATGCGAGATACTTGGAGTGTGTATGGAAAATTGGGA 

TTCAAGTTGAGGGTGATCTAGACAGAGGAGCGGTCGAGAGAGCTGTGAGGAGGTTAATGGTGGAGGAAG 

AAGGGGAGGGGATGAGGAAGAGAGCTATCAGTTTGAAAGAGCAACTTAGAGCCTCTGTTATAAGTGGAG 

GTTCTTCACACAACTCGCTAGAGGAGTTTGTACACTACATGAGGACTCTATGA 
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ATGCAGGTTTTGGGAATGGAGGAAAAGCCTGCAAGGAGAAGCGTAGTGTTGGTTCCATTTCCAGCACAA 

GGACATATATCTCCAATGATGCAACTTGCCAAAACCCTTCACTTAAAGGGTTTCTCGATCACAGTTGTT 

CAGACTAAGTTCAATTACTTTAGCCCTTCAGATGACTTCACTCATGATTTTCAGTTCGTCACCATTCCA 

GAAAGCTTACCAGAGTCTGATTTCAAGAATCTCGGACCAATACAGTTTCTGTTTAAGCTCAACAAAGAG 

TGTAAGGTGAGCTTCAAGGACTGTTTGGGTCAGTTGGTGCTGCAACAAAGTAATGAGATCTCATGTGTC 

ATCTACGATGAGTTCATGTACTTTGCTGAAGCTGCAGCCAAAGAGTGTAAGCTTCCAAACATCATTTTC 

AGCACAACAAGTGCCACGGCTTTCGCTTGCCGCTCTGTATTTGACAAACTATATGCAAACAATGTCCAA 

GCTCCCTTGAAAGGTACTCTAAAACTCTCTGTTTCGTGGTTTCCGCGAGTGGCTATAAGATTGAAACAG 

CATTGTTTTTGACCTTTTTTGCAGAAACTAAAGGACAACAAGAAGAGCTAGTTCCGGAGTTTTATCCCT 

TGAGATATAAAGACTTTCCAGTTTCACGGTTTGCATCATTAGAGAGCATAATGGAGGTGTATAGGAATA 

CAGTTGACAAACGGACAGCTTCCTCGGTGATAATCAACACTGCGAGCTGTCTAGAGAGCTCATCTCTGT 

CTTTTCTGCAACAACAACAGCTACAAATTCCAGTGTATCCTATAGGCCCTCTTCACATGGTGGCCTCAG 

CTCCTACAAGTCTGCTTGAAGAGAACAAGAGCTGCATCGAATGGTTGAACAAACAAAAGGTAAACTCGG 

TGATATACATAAGCATGGGAAGCATAGCTTTAATGGAAATCAACGAGATAATGGAAGTCGCGTCAGGAT 

TGGCTGCTAGCAACCAACACTTCTTATGGGTGATCCGACCAGGGTCAATACCTGGTTCCGAGTGGATAG 

AGTCCATGCCTGAAGAGTTTAGTAAGATGGTTTTGGACCGAGGTTACATTGTGAAATGGGCTCCACAGA 

AGGAAGTACTTTCTCATCCTGCAGTAGGAGGGTTTTGGAGCCATTGTGGATGGAACTCGACACTAGAAA 

GCATCGGCCAAGGAGTTCCAATGATCTGCAGGCCATTTTCGGGTGATCAAAAGGTGAACGCTAGATACT 

TGGAGTGTGTATGGAAAATTGGGATTCAAGTGGAGGGTGAGCTAGACAGAGGAGTGGTCGAGAGAGCTG 

TGAAGAGGTTAATGGTTGACGAAGAAGGAGAGGAGATGAGGAAGAGAGCTTTCAGTTTAAAAGAGCAAC 

TTAGAGCCTCTGTTAAAAGTGGAGGCTCTTCACACAACTCGCTAGAAGAGTTTGTACACTTCATAAGGA 
CTCTATGA 
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ATGGAGGAAAAGCAAGTGAAGGAGACAAGGATAGTGTTGGTTCCAGTTCCAGCTCAAGGTCATGTAACT 

CCGATGATGCAACTAGGAAAAGCTCTTCACTCAAAGGGTTTCTCCATCACTGTTGTTCTGACACAGTCT 

AATCGAGTTAGCTCTTCCAAAGACTTCTCTGATTTCCATTTCCTCACCATCCCAGGCAGCTTAACTGAG 

TCTGATCTCCAAAACCTAGGACCACAAAAGTTTGTGCTCAAGCTCAATCAAATTTGTGAGGCAAGCTTC 

AAGCAGTGTATAGGTCAACTATTGCATGAACAATGTAATAATGATATTGCTTGTGTCGTCTACGATGAG 

TACATGTACTTCTCTCATGCTGCAGTAAAAGAGTTTCAACTTCCTAGTGTCGTCTTTAGCACGACAAGT 

GCTACTGCTTTTGTCTGTCGCTCTGTTTTGTCTAGAGTCAACGCAGAGTCGTTCTTGATCGACATGAAA 

GGTATTCAAGATTCTAGCTTGTTTTATCTTAATTCAAAATCCTATTTATAGAAACTAATCCAAATGATC 

GATGTTATCTTTTCAGATCCTGAAACACAAGACAAAGTATTTCCAGGGTTGCATCCTCTGAGGTACAAG 

GATCTACCAACTTCAGTATTTGGGCCAATAGAGAGTACGCTCAAGGTTTACAGTGAGACTGTGAACACT 

CGAACAGCTTCCGCTGTTATCATCAACTCAGCAAGCTGTTTAGAGAGCTCATCTTTGGCAAGGTTGCAA 

CAACAACTGCAAGTTCCGGTGTATCCTATAGGCCCACTTCATATTACAGCTTCAGCGCCTTCTAGTTTA 

CTAGAAGAAGACAGGAGTTGCGTTGAGTGGTTGAACAAGCAAAAATCAAATTCAGTTATTTACATAAGC 

TTGGGAAGCTTGGCTCTAATGGACACCAAAGACATGTTGGAGATGGCTTGGGGATTAAGTAATAGCAAC 

CAACCTTTCTTATGGGTGGTCAGACCGGGCTCTATTCCGGGGTCAGAATGGACAGAGTCCTTACCAGAG 

GAATTCAATAGGTTGGTTTCAGAAAGAGGTTACATTGTGAAATGGGCTCCGCAGATGGAAGTTCTCAGA 

CATCCTGCAGTAGGAGGGTTTTGGAGTCACTGTGGATGGAACTCAACAGTAGAGAGCATCGGGGAAGGA 

GTTCCGATGATATGTAGGCCTTTCACCGGGGATCAGAAAGTCAATGCGAGGTACTTAGAGAGAGTTTGG 

AGAATTGGGGTTCAATTGGAGGGAGATCTGGATAAAGAAACTGTGGAGAGAGCTGTAGAGTGGTTGCTT 

GTGGATGAAGAAGGAGCAGAAATGAGGAAGAGAGCCATTGACTTGAAAGAAAAGATTGAAACCTCTGTT 

AGAAGTGGAGGTTCCTCATGCAGCTCACTAGACGACTTTGTTAATTCCATGTGA 
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gtctctgacggtgttccggagggaaccatgctcgggaatccact^ 

GCGGCTCCACGTATTTTCCGGAGCGAAATCGCGGCGGCAGAGATAGAAGTTGGAAAGAAAGTGA^ 

tttatggttattattattatttatctcctggtac^tgtgagtatggaagS 

GGAATGGAGAATTACAGAGTTAAAGATATACCAGAGGAAGTTGTAT^GAAGATTTGGAC^ 
CCAAAGGCTTTATACCAAATGAGTCTTGCTTTACCTCGTGCCTCTGCT^ 

ACGTTATTATCTTCTACATCGGAGAAAGAGATGCGTGATCCTCATGGCTGCTTTGCTTGGATGGGGAAr 

agatcagctgcttctgtagcgtacattagcttcggcaccgtcatggaacctcctcctgS 
gcgatagcacaagggttggaatcaagcaaagtgccgtttgtttggtcgctgaaggS^ 

CATCTACCAAAAGGGTTTTTGGATCGGACAAGAGAGCAAGGGATAGTGGTTCCTTG^ 

gaactgctgaaacacgaggcaatgggtgtgaatgtgacacattgtggatggSctS 

GTGTCGGCAGGTGTACCGATGATCGGCAGACCGATTTTGGCGGATAATAGGCTCAACGGAAG^ 

gaggttgtgtggaaggttggagtgatgatggataatggag^ 

TTGAATGATGTTTTTGTTCATGATGATGGTAAGACGATGAAGGCTAATGCCAAG^AGCTTAAAG^ 
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ATGAAAGTGAACGAGGAAAACAACAAGCCGACAAAGACCCATGTCTTAATCTTCCCATTTCCGGCrrAA 

CCCACCAAGATCAACGAAGATGACGATAACGAGATCCTCCACT^ 
TACCGTTTTGATCAGATCTCCTCTCTTTACAGAAGTTACGTTCACGGAG^T?^ 
AGAGACTCCTTTAGAGATAACGTGGCGAGTTGGGGACTCGTCGTG^CTCGTTCACCGCCATGGA^ 
GTTTATCTCGAACATCTTAAGCGAGAGATGGGCCATGATCGTGTATCGGCTGTAGGCC^ 

GCACTCGCCTCTGGGCTTGAGAAAAGCGGCGTCCATTTCATATGGGCCGTAAAGGAGCCCGTTGAGAAA 
r? C ^ ACGTGGCMCATCCTGGACGGTTTCGACG ^ 

ggatgggctccacaagtagctgtgctacgtcaccgagccgttggcgcgtttttaacgcactgtggttS 

^CTCTGTGGTGGAGGCGGTTGTCGCCGGCGTTTTGATGCTGACGT^ 
A C?GACCCcScSS 

GGATTTATCCAACATGTCGTTAGTTTAGGACTAAACAAATGA 
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ATGAGCATAGATATTTTTCAAGAAATAAGAATAAAGAAAATTCTACTCTTAATGGCGGAAGCAAACACT 
CCACACATAGCAATCATGCCGAGTCCCGGTATGGGTCACCTTATCCCATTCGTCGAGTTAGCAAAGCGA 
CTCGTTCAGCACGACTGTTTCACCGTCACAATGATCATCTCCGGTGAAACTTCGCCGTCTAAGGCACAA 
AGATCCGTTCTCAACTCTCTCCCTTCCTCCATAGCCTCCGTATTTCTCCCTCCCGCCGATCTTTCCGAT 
GTTCCCTCCACAGCGCGAATCGAAACTCGGGCCATGCTCACCATGACTCGTTCCAATCCGGCGCTCCGG 
GAGCTTTTTGGCTCTTTATCAACGAAGAAAAGTCTCCCGGCGGTTCTCGTCGTCGATATGTTTGGTGCG 
GATGCGTTCGACGTGGCCGTTGACTTCCACGTGTCACCATACATTTTCTATGCATCCAATGCAAACGTC 
TTGTCGTTTTTTCTTCACTTGCCGAAACTAGACAAAACGGTGTCGTGTGAGTTTAGGTACTTAACCGAA 
CCGCTTAAGATTCCCGGCTGTGTCCCGATAACCGGTAAGGACTTTCTTGATACGGTTCAAGACCGAAAC 
GACGACGCATACAAATTGCTTCTCCATAACACCAAGAGGTACAAAGAAGCTAAAGGGATTCTAGTGAAT 
TCCTTCGTTGATTTAGAGTCGAATGCAATAAAGGCCTTACAAGAACCGGCTCCTGATAAACCAACGGTA 
TACCCGATTGGGCCGCTGGTTAACACAAGTTCATCTAATGTTAACTTGGAAGACAAGTTCGGATGTTTA 
AGTTGGCTAGACAACCAACCATTCGGCTCGGTTCTATACATATCATTTGGAAGCGGCGGAACACTTACA 
TGTGAGCAGTTTAATGAGCTTGCTATTGGTCTTGCGGAGAGCGGAAAACGGTTTATTTGGGTCATACGA 
AGTCCAAGCGAGATAGTTAGTTCGTCGTATTTCAATCCACACAGCGAGACAGACCCCTTTTCGTTTTTA 
CCAATTGGGTTCTTAGACCGAACCAAAGAGAAAGGTTTGGTGGTTCCATCATGGGCTCCACAGGTTCAA 
ATCCTGGCTCATCCATCCACATGCGGGTTTTTAACACACTGTGGATGGAATTCGACCTTAGAAAGCATT 
GTAAACGGTGTACCACTCATAGCGTGGCCTTTATTCGCGGAGCAAAAGATGAATACATTGCTACTCGTG 
fiAGGATGTTGGAGCGGCTCTAAGAATCCATGCGGGTGAAGATGGGATTGTACGGAGGGAAGAAGTGGTG 
AGAGTGGTGAAGGCACTGATGGAAGGTGAAGAGGGAAAAGCCATAGGAAATAAAGTGAAGGAGTTGAAA 
GAAGGAGTTGTTAGAGTCTTGGGTGACGATGGATTGTCCAGCAAGTCATTTGGTGAAGTTTTGTTAAAG 
TGGAAAACGCACCAGCGAGATATCAACCAAGAGACGTCCCACTAA 
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